asp.net - Read first 3 paragraphs of a long string. [C#, HTML AgilityPack] -
i read long string , output first 3 paragraphs of string. how achieve this? wanted use code show (n) number of words have since changed paragraphs.
public string mysummary(string html, int max) { string summaryhtml = string.empty; // load our html document htmldocument htmldoc = new htmldocument(); htmldoc.loadhtml(html); int wordcount = 0; foreach (var element in htmldoc.documentnode.childnodes) { // inner text strip out html, , give plain text string elementtext = element.innertext; // split space words in element string[] elementwords = elementtext.split(new char[] { ' ' }); // , if haven't used many words ... if (wordcount <= max) { // add *outer* html (which have proper // html formatting fragment) summary summaryhtml += element.outerhtml; wordcount += elementwords.count() + 1; } else { break; } } return summaryhtml ; }
if paragraphs mean <p>
tags, childnodes of document <p>
s , pull first 3's inner text?
edit re comment:
rtfm?
http://htmlagilitypack.codeplex.com/wikipage?title=examples&referringtitle=home
something like:
string.join(doc.documentelement.selectnodes("//p").take(3).select(n => n.text).toarray(), " ");
Comments
Post a Comment