Skip to content Skip to sidebar Skip to footer

How To Get Plain Text In Between Multiple Html Tag Using Scrapy

I am trying to grab all text from multiple tag from a given URL using scrapy .I am new to scrapy. I don't have much idea how to achieve this.Learning through examples and people

Solution 1:

@paultrmbrth suggested me this solution and it work for me

def parse_item(self,response):


        with open(text, 'wb') as f:
            f.write("".join(response.xpath('//body//*[not(self::script or self::style)]/text()').extract() ).encode('utf-8'))

        item = DmozItem()
        yield item

Post a Comment for "How To Get Plain Text In Between Multiple Html Tag Using Scrapy"