utf 8 - How to remove u'' from python script result? -


i'm trying write parsing script using python/scrapy. how can remove [] , u' strings in result file?

now have text this:

from scrapy.spider import basespider scrapy.selector import htmlxpathselector scrapy.utils.markup import remove_tags googleparser.items import googleparseritem import sys  class googleparserspider(basespider):     name = "google.com"     allowed_domains = ["google.com"]     start_urls = [         "http://www.google.com/search?q=this+is+first+test&num=20&hl=uk&start=0",     "http://www.google.com/search?q=this+is+second+test&num=20&hl=uk&start=0"     ]      def parse(self, response):        print "===start======================================================="        hxs = htmlxpathselector(response)        qqq = hxs.select('/html/head/title/text()').extract()        print qqq        print "---data--------------------------------------------------------"         sites = hxs.select('/html/body/div[5]/div[3]/div/div/div/ol/li/h3')        = 1        items = []        site in sites:            try:            item = googleparseritem()            title1 = site.select('a').extract()            title2=str(title1)            title=remove_tags(title2)            link=site.select('a/@href').extract()                item['num'] =              item['title'] = title                item['link'] = link                i= i+1                items.append(item)            except:                 print 'exception'        return items        print "===end========================================================="  spider = googleparserspider() 

and have result after running

python scrapy-ctl.py crawl google.com  2010-07-25 17:44:44+0300 [-] log opened. 2010-07-25 17:44:44+0300 [googleparser] debug: enabled extensions: corestats, closespider, webservice, telnetconsole, memoryusage 2010-07-25 17:44:44+0300 [googleparser] debug: enabled scheduler middlewares: duplicatesfiltermiddleware 2010-07-25 17:44:44+0300 [googleparser] debug: enabled downloader middlewares: httpauthmiddleware, downloaderstats, useragentmiddleware, redirectmiddleware, defaultheadersmiddleware, cookiesmiddleware, httpcompressionmiddleware, retrymiddleware 2010-07-25 17:44:44+0300 [googleparser] debug: enabled spider middlewares: urllengthmiddleware, httperrormiddleware, referermiddleware, offsitemiddleware, depthmiddleware 2010-07-25 17:44:44+0300 [googleparser] debug: enabled item pipelines: csvwriterpipeline 2010-07-25 17:44:44+0300 [-] scrapy.webservice.webservice starting on 6080 2010-07-25 17:44:44+0300 [-] scrapy.telnet.telnetconsole starting on 6023 2010-07-25 17:44:44+0300 [google.com] info: spider opened 2010-07-25 17:44:45+0300 [google.com] debug: crawled (200) <get http://www.google.com/search?q=this+is+first+test&num=20&hl=uk&start=0> (referer: none) ===start======================================================= [u'this first test - \u041f\u043e\u0448\u0443\u043a google'] ---data-------------------------------------------------------- 2010-07-25 17:52:42+0300 [google.com] debug: scraped googleparseritem(num=1, link=[u'http://www.amazon.com/first-protector-small-tamora-pierce/dp/0679889175'], title=u"[u'amazon.com: first test (protector of small) (9780679889175 ...']") in <http://www.google.com/search?q=this+is+first+test&num=100&hl=uk&start=0> 

and text in file:

1,[u'amazon.com: first test (protector of small) (9780679889175 ...'],[u'http://www.amazon.com/first-protector-small-tamora-pierce/dp/0679889175'] 

more prettier - print qqq.pop()


Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -