开发者问题收集

使用 Requests_HTML 抓取 JS 渲染的页面无法按预期工作

2020-08-23
193

我正在抓取一个 JS 渲染的页面( https://www.flipkart.com/search?q=Acer+Laptops )。此页面中的产品图像正在动态加载。这些图像的预渲染 SRC 值为

//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

渲染后,SRC 应该是这样的

https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70

使用 request_html 我可以获取 SRC 值,但它只适用于顶部的前几张图片。请帮帮我?我的代码:-

res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render()
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
   item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
   print(item_image)

输出:-

https://rukminim1.flixcart.com/image/312/312/kamtsi80/computer/m/8/y/acer-na-gaming-laptop-original-imafs5prytwgrcyf.jpeg?q=70
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg

如您所见,前两张图片已加载,其余图片未加载。 提前谢谢大家!

2个回答
import requests
import re


def main(url):
    r = requests.get(url)
    match = [x.group(1) for x in re.finditer(
        'dynamicImageUrl":"(.*?)"', r.text)]
    print(match)


main("https://www.flipkart.com/search?q=Acer+Laptops")

输出:

['http://rukmini1.flixcart.com/flap/{@width}/{@height}/image/c9ef9eae08a3b038.jpg?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}']

现在您可以根据需要替换 width height quality

默认值为 312 x 312 x 70

αԋɱҽԃ αмєяιcαη
2020-08-23

我找到了解决方案,因为图像是延迟加载的,所以我必须在“render()”函数中使用“scrolldown”和“sleep”参数。找到以下代码:

res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render(scrolldown=20, sleep=.1)
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
   item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
   print(item_image)
Aftab
2020-08-24