.net - Iron python, beautiful soup, win32 app -
does beautiful soup work iron python? if version of iron python? how easy distribute windows desktop app on .net 2.0 using iron python (mostly c# calling python code parsing html)?
i asking myself same question , after struggling follow advice here , elsewhere ironpython , beautifulsoup play nicely existing code decided go looking alternative native .net solution. beautifulsoup wonderful bit of code , @ first didn't there comparable available .net, found html agility pack , if think i've gained maintainability on beautifulsoup. takes clean or crufty html , produces elegant xml dom can queried via xpath. couple lines of code can raw xdocument , craft queries in linq xml. honestly, if web scraping goal, cleanest solution find.
edit
here simple (read: not robust @ all) example parses out house of representatives holiday schedule:
using system; using system.collections.generic; using htmlagilitypack; namespace govparsingtest { class program { static void main(string[] args) { htmlweb hw = new htmlweb(); string url = @"http://www.house.gov/house/house_calendar.shtml"; htmldocument doc = hw.load(url); htmlnode docnode = doc.documentnode; htmlnode div = docnode.selectsinglenode("//div[@id='primary']"); htmlnodecollection tablerows = div.selectnodes(".//tr"); foreach (htmlnode row in tablerows) { htmlnodecollection cells = row.selectnodes(".//td"); htmlnode datenode = cells[0]; htmlnode eventnode = cells[1]; while (eventnode.haschildnodes) { eventnode = eventnode.firstchild; } console.writeline(datenode.innertext); console.writeline(eventnode.innertext); console.writeline(); } //console.writeline(div.innerhtml); console.readkey(); } } }
Comments
Post a Comment