获得从 html 中的字符串Android

标签: Android HTML
发布时间: 2017/3/18 18:01:34
注意事项: 本文中文内容可能为机器翻译,如要查看英文原文请点击上面连接.

我有 html 的字符串。我想要提取 src 属性从 tag 。我得到 html 字符串在"summaryContent",现在我想它也没找到,返回 src。如果此字符串包含两个或三个 tag 然后它应该找到所有"src"它。

for (int i = 0; i < contents.size(); i++) {
                if (contents.get(i).summary != null) {
                    summaryContent = contents.get(i).summary; // There is only one time this condition is true
                } else {
                    continue;
                }

这就是我得到它在 summaryContent

<ol start="7">
<li>
<h3><strong>Charlotte Casiraghi</strong></h3>
</li>
</ol>
<strong>Family Fortune:  </strong>$1 billion
<img class="size-full wp-image-346 aligncenter" src="http://rarelyknownthings.com/wp-content/uploads/2015/10/Picture1.png" alt="Picture1" width="943" height="1350" />
&nbsp;
&nbsp;
Charlotte Marie Pomeline Casiraghi is the second child of Caroline Princess of Hanover, Princess of Monaco and Stefano Casiraghi, an industrialist. She is eight in line to the throne of Monaco. Charlotte is a published writer and magazine editor.
<img class="aligncenter" src="http://rarelyknownthings.com/wp-content/uploads/2015/10/f762a5ca08aab85785f48c8425f089d7.png" alt="" />
Charlotte and her two brothers were born in the Mediterranean Principality of Monaco. When she was four years old, her father was killed in a boating accident. After his death, Princess Caroline moved the family to the Midi village of Saint-Rémy-de-Provence in France, with the intention of minimizing their exposure to the press.
<!--nextpage-->
<ol start="6">
<li>
<h3><strong>Hind Hariri</strong></h3>
</li>
</ol>

解决方法 1:

您可以提取它使用一个正则表达式︰

Pattern p = Pattern.compile("src\\s*=\\s*['\"]([^'\"]+)['\"]");
Matcher m = p.matcher(summaryContent);
if (m.find()) {
  String srcResult = m.group(1);
}

解释

  • src从字面上匹配字符 src (区分大小写)

  • \s*匹配任何白色 space 字符 [\r\n\t\f]

  • Quantifier: *之间零和无限的倍可能多次,给后面需要 [贪婪]

  • =匹配的字符从字面上 =

  • \s*匹配任何白色 space 字符 [\r\n\t\f]

  • Quantifier: *之间零和无限的倍可能多次,给后面需要 [贪婪]

  • ['"]匹配单个字符出现在下面的列表中

  • '"在列表中的单个字符 '"从字面上 (区分大小写)

  • 1st Capturing group ([^'"]+)匹配单个字符在下面的列表中不存在

  • Quantifier: +一之间和无限的倍可能多次,给后面需要 [贪婪]

  • '"在列表中的单个字符 '"从字面上 (区分大小写)

  • ['"]匹配单个字符出现在下面的列表中

  • '"在列表中的单个字符 '"从字面上 (区分大小写)

官方微信
官方QQ群
31647020