tag:web-scraping

puppeteer 中 element.click() 的问题

我遇到了以下问题。如果我运行此代码，我会收到此错误消息，而且似乎无法修复它。Uncaught Error Error: Evaluation failed: TypeError: Cannot readproperties of undefined (reading 'click')at _evaluateInternal (c:\Users\Name\node_modules\puppeteer

2022-05-10

Selenium/chrome 驱动程序不断崩溃“Chrome 无法启动：正常退出”和“DevToolsActivePort 文件不存在”

我尝试运行 selenium/chromedrive 脚本，但一直出现以下错误。Selenium ver 4.72Chrome 浏览器版本：版本 108.0.5359.125（官方版本）（64 位）ChromeDriver 版本：ChromeDriver 108.0.5359.71Message: unknown error: Chrome failed to start: exited no

python selenium web-scraping selenium-chromedriver undetected-chromedriver

2022-12-20

在 Python 中使用未检测到的 chromedriver 时，无法连接到 127.0.0.1:37541 上的 chrome

使用 Selenium 后，我决定尝试undetected-chromedriver，因此我使用 pip install undetected-chromedriver安装了它但是，运行这个简单的脚本import undetected_chromedriver.v2 as ucoptions = uc.ChromeOptions()options.add_argument('--no-sandbo

python selenium web-scraping selenium-chromedriver undetected-chromedriver

2021-10-01

使用 Python 和 selenium 抓取 URL

我正在尝试让一个 python selenium 脚本运行，该脚本应该执行以下操作：Take text file, BookTitle.txt that is a list of Book Titles.Using Python/Selenium then searches the site, GoodReads.com for that title.Takes the URL for the r

python selenium web-scraping selenium-chromedriver

2019-09-08

Puppeteer：正确选择内部文本

我想获取具有特定类名的字符串，比如说“CL1”。这是用来做的事情，它起作用了：（我们在 asycn 函数中）var counter = await page.evaluate(() => {return document.querySelector('.CL1').innerText;});现在，几个月后，当我尝试运行代码时，我收到此错误：Error: Evaluation failed: Typ

javascript node.js web-scraping puppeteer

2019-06-21

PHP-如何使用 preg_match_all 获取具有特定类名的 img 标签的 src？

我正在尝试从 Amazon 产品搜索列表页面创建一个抓取工具。方法：function getHTMLcode($url) {$curl = curl_init($url);curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 10.10; labnol;) ctrlq.org");curl_setopt

php html regex web-scraping

2019-05-16

Python-requests：无法从页面中抓取所有 html 代码

我正在尝试抓取金融时报搜索页面的内容。使用Requests，我可以轻松抓取文章的标题和超链接。我想获取下一页的超链接，但与文章的标题或超链接不同，我在 Requests 响应中找不到它。from bs4 import BeautifulSoupimport requestsurl = 'http://search.ft.com/search?q=SABMiller+PLC&t=all&rpp=10

web-scraping python-requests

2016-04-04

Python 请求获取的 HTML 数据与浏览器不同；JS 似乎无关紧要

我试图从这个网站抓取天气数据：http://www.fastweather.com/yesterday.php?city=St.+Louis_MO我遇到的问题是昨天的降水量。在开发人员工具中查看时，我看到以下内容：<strong>Yesterday's Precipitation</strong>was 0.13 inches但是从 Python 查看时，无论是使用 Requests 还是 url

python html python-3.x web-scraping python-requests

2018-02-15

Python request.get(url) 返回 javascript 代码而不是页面 html

我有一个非常简单的问题。我试图从 linkedIn 页面的 html 中获取职位描述，但我得到的不是页面的 html，而是几行看起来像 javascript 代码的代码。我对这个很陌生，所以任何帮助都将不胜感激！谢谢这是我的代码：import requestsurl = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-ster

python-3.x web-scraping beautifulsoup python-requests

2019-01-28

尝试使用 request-html (Python 3.6) 抓取 JS 网页时出现问题

上周我一直在尝试从 Epic Games Store 网页 (https://www.epicgames.com/store/en-US/) 抓取信息，我首先尝试使用 Requests 模块，但很快意识到我需要一个支持 javascript 网页的模块。这就是我现在正在尝试的，但有一个问题...当我在页面上使用“检查元素”时，一切都很好，但是当我执行此操作时：from requests_html

python html python-3.x web-scraping python-requests-html

2019-11-23

使用 Requests_HTML 抓取 JS 渲染的页面无法按预期工作

我正在抓取一个 JS 渲染的页面（https://www.flipkart.com/search?q=Acer+Laptops）。此页面中的产品图像正在动态加载。这些图像的预渲染 SRC 值为//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg渲染后，SRC 应该是这样的https://rukminim1.f

python web-scraping python-requests python-requests-html

2020-08-23

Python HTML 页面中的 Web 抓取不完整

我试图从页面中抓取两个表格但是当我使用 soup.find('table') 时，它就是找不到它。此外，当我打印 soup 对象时，HTML 代码的表格部分没有被打印出来，有什么解决办法吗？到目前为止我的代码：from bs4 import BeautifulSoupimport pandas as pdimport requestsurl = 'http://www.b3.com.br/pt_b

python html web-scraping python-requests

2020-11-21

Python：requests.get 获取错误的 html 文件

我试图从https://essentials.swissdox.ch抓取数据，该链接仅适用于 VPN。因此，我所做的是，我使用查询参数生成一个 URL，并尝试获取相应的 html 文件。问题是，虽然链接有效，但 Python 为我提供了https://essentials.swissdox.ch起始页的 html 文件。我非常感谢任何帮助！示例：我想要以下 url 的 html 文件：https:

python html web-scraping python-requests

2021-02-03

request-html 模块没有响应

我是 Python 新用户，正在尝试使用请求-html 模块进行网页抓取。我在 Mac 上使用 Jupyter。当我输入pip install requests-html时，似乎可以安装模块，因为我收到以下消息：Requirement already satisfied: requests-html in /Users/usr/opt/anaconda3/lib/python3.8/site-pa

python web-scraping python-requests-html

2021-02-22

python async request_html div 未加载 JS （？）数据

我试图获取有关给定类及其游戏风格的指南链接。此处屏幕截图中以黄色突出显示的是负责渲染的 div。我需要使用 async，因为此类用于 discord.py 机器人，尝试使用 HTMLSession() 导致错误，提示我需要使用 AsyncHTMLSession。网站地址 -https://immortal.maxroll.gg/category/build-guides#classes%3D%5B

python web-scraping python-requests-html

2022-06-07