使用 Requests_HTML 抓取 JS 渲染的页面无法按预期工作
2020-08-23
193
我正在抓取一个 JS 渲染的页面( https://www.flipkart.com/search?q=Acer+Laptops )。此页面中的产品图像正在动态加载。这些图像的预渲染 SRC 值为
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
渲染后,SRC 应该是这样的
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
使用 request_html 我可以获取 SRC 值,但它只适用于顶部的前几张图片。请帮帮我?我的代码:-
res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render()
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
print(item_image)
输出:-
https://rukminim1.flixcart.com/image/312/312/kamtsi80/computer/m/8/y/acer-na-gaming-laptop-original-imafs5prytwgrcyf.jpeg?q=70
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
如您所见,前两张图片已加载,其余图片未加载。 提前谢谢大家!
2个回答
import requests
import re
def main(url):
r = requests.get(url)
match = [x.group(1) for x in re.finditer(
'dynamicImageUrl":"(.*?)"', r.text)]
print(match)
main("https://www.flipkart.com/search?q=Acer+Laptops")
输出:
['http://rukmini1.flixcart.com/flap/{@width}/{@height}/image/c9ef9eae08a3b038.jpg?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}']
现在您可以根据需要替换 width 、 height 和 quality 。
默认值为 312 x 312 x 70
αԋɱҽԃ αмєяιcαη
2020-08-23
我找到了解决方案,因为图像是延迟加载的,所以我必须在“render()”函数中使用“scrolldown”和“sleep”参数。找到以下代码:
res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render(scrolldown=20, sleep=.1)
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
print(item_image)
Aftab
2020-08-24