开发者问题收集

python async request_html div 未加载 JS (?)数据

2022-06-07
76

我试图获取有关给定类及其游戏风格的指南链接。此处屏幕截图中以黄色突出显示的是负责渲染的 div。 我需要使用 async ,因为此类用于 discord.py 机器人,尝试使用 HTMLSession() 导致错误,提示我需要使用 AsyncHTMLSession。 网站地址 - https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-necromancer%5D%26metas%3D%5Bdi-pvp%5D enter image description here

但是我的代码将此 div 输出为空

代码

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()

class Scrapper:
    def __init__(self):
        self.headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0'}

    async def return_page(self, url):
        
        response = await asession.get(url)
        await response.html.arender(timeout=15, sleep=5)
        #article = response.html.find('#filter-results', first=True)
        print(response.html.html)
        return response

    async def return_build_articles(self, userClass, instance):
        
        url = f"https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-{userClass}%5D%26metas%3D%5Bdi-{instance}%5D"

        
        
        articles = await self.return_page(url)

选定部分输出

</div>
</div>
</div>
</div>
</div>
</form> <hr class="global-separator ">
<div id="immortal-mobile-mid-banner" class="adsmanager-ad-container mobile-banner"></div>
<div id="filter-results" class="posts-list"><!-- here should be the results --></div>
<div class="page-navigation" role="navigation"></div>
</div>
</div>
</div>
1个回答

您在页面上看到的数据是通过 Javascript 从外部 URL 加载的。要异步获取数据,您可以使用 asynchttp 包。例如:

import json
import asyncio
import aiohttp


url = "https://site-search-origin.maxroll.gg/indexes/wp_posts_immortal/search"

data = {
    "filters": '(classes = "di/necromancer") AND (metas = "di/PvP") AND (category = "Build Guides")',
    "limit": 1000,
    "offset": 0,
    "q": "",
}

headers = {
    "X-Meili-API-Key": "3c58012ad106ee8ff2c6228fff2161280b1db8cda981635392afa3906729bade"
}


async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data, headers=headers) as resp:
            json_data = await resp.json()

    # uncomment to print all data:
    # print(json.dumps(json_data, indent=4))

    for hit in json_data["hits"]:
        print(hit["post_title"])
        print(hit["permalink"])
        print("-" * 80)


asyncio.run(main())

打印:

Bone Spikes Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-spikes-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------
Bone Wall Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-wall-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------
Andrej Kesely
2022-06-07