开发者问题收集

为什么我无法使用 GitHub Actions 连接到 MongoDB Atlas?

2023-03-25
1092

我想创建一个工作流,每天在预定的时间自动从 Google Play 商店抓取应用评论,并将其存储在 MongoDB Atlas 中的收藏中。因此,首先,我创建了一个名为 scraping_daily.py 的 Python 脚本,它将抓取 5,000 条新评论并过滤掉之前收集的任何评论。当我对其进行测试并手动运行它时,该脚本运行良好。以下是该脚本的样子:

# Import libraries
import numpy as np
import pandas as pd
from google_play_scraper import Sort, reviews, reviews_all, app
from pymongo import MongoClient

# Create a connection to MongoDB
client = MongoClient("mongodb+srv://<MY_USERNAME>:<MY_PASSWORD>@project1.lpu4kvx.mongodb.net/?retryWrites=true&w=majority")
db = client["vidio"]
collection = db["google_play_store_reviews"]

# Load the data from MongoDB
df = pd.DataFrame(list(collection.find()))
df = df.drop("_id", axis=1)
df = df.sort_values("at", ascending=False)

# Collect 5000 new reviews
result = reviews(
    "com.vidio.android",
    lang="id",
    country="id",
    sort=Sort.NEWEST,
    count=5000
)
new_reviews = pd.DataFrame(result[0])
new_reviews = new_reviews.fillna("empty")

# Filter the scraped reviews to exclude any that were previously collected
common = new_reviews.merge(df, on=["reviewId", "userName"])
new_reviews_sliced = new_reviews[(~new_reviews.reviewId.isin(common.reviewId)) & (~new_reviews.userName.isin(common.userName))]

# Update MongoDB with any new reviews that were not previously scraped
if len(new_reviews_sliced) > 0:
    new_reviews_sliced_dict = new_reviews_sliced.to_dict("records")

    batch_size = 1_000
    num_records = len(new_reviews_sliced_dict)
    num_batches = num_records // batch_size

    if num_records % batch_size != 0:
        num_batches += 1

    for i in range(num_batches):
        start_idx = i * batch_size
        end_idx = min(start_idx + batch_size, num_records)
        batch = new_reviews_sliced_dict[start_idx:end_idx]

        if batch:
            collection.insert_many(batch)

接下来,我想使用 GitHub Actions 来安排我的脚本。就像我遵循 YouTube 教程一样,我在 .github/workflows 文件夹中创建了一个 actions.yml 文件。以下是 YAML 文件的样子:

name: Scraping Google Play Reviews

on:
  schedule:
    - cron: 50 16 * * * # At 16:50 every day

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: check out the repository content
        uses: actions/checkout@v2
      
      - name: set up python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: install requirements
        run:
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: execute the script
        run: python -m scraping_daily.py

但是,它在执行我的脚本时总是会抛出错误。错误消息是:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/home/runner/work/vidio_google_play_store_reviews/vidio_google_play_store_reviews/scraping_daily.py", line 16, in <module>
    df = pd.DataFrame(list(collection.find()))
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/cursor.py", line 1248, in next
    if len(self.__data) or self._refresh():
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/cursor.py", line 1139, in _refresh
    self.__session = self.__collection.database.client._ensure_session()
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1740, in _ensure_session
    return self.__start_session(True, causal_consistency=False)
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1685, in __start_session
    self._topology._check_implicit_session_support()
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 538, in _check_implicit_session_support
    self._check_session_support()
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 554, in _check_session_support
    self._select_servers_loop(
  File "/opt/hostedtoolcache/Python/3.10.10/x64/lib/python3.10/site-packages/pymongo/topology.py", line 238, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net:27017: connection closed,ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net:27017: connection closed,ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net:27017: connection closed, Timeout: 300.0s, Topology Description: <TopologyDescription id: 641dd5b78e0efba394e00ffc, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-00.lpu4kvx.mongodb.net:27017: connection closed')>, <ServerDescription ('ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-01.lpu4kvx.mongodb.net:27017: connection closed')>, <ServerDescription ('ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-dc8axn9-shard-00-02.lpu4kvx.mongodb.net:27017: connection closed')>]>
Error: Process completed with exit code 1.

我尝试通过在 MongoClient() 中添加 serverSelectionTimeoutMS=300000 来增加超时设置,但仍然出现相同的错误。我该如何解决这个问题?

顺便说一句,我使用的是 Windows 计算机(但我不确定这是否有用)。

3个回答

要通过 GitHub Actions 访问您的 MongoDB 数据库,您需要添加 IP 地址并选择 允许从任何地方访问 选项。

Mathew Darren Kusuma
2023-03-26

只需添加到@jonrsharpe的答案中,可以使用 MongoDB Atlas CLI github操作 简化此方法。

  # Grant temporary MongoDB access to this Github Action runner ip address
  - name: Get the public IP of this runner
    id: get_gh_runner_ip
    shell: bash
    run: |
      echo "ip_address=$(curl https://checkip.amazonaws.com)" >> "$GITHUB_OUTPUT"
  - name: Setup MongoDB Atlas cli
    uses: mongodb/[email protected]
  - name: Add runner IP to MongoDB access list
    shell: bash
    run: |
      atlas accessLists create ${{ steps.get_gh_runner_ip.outputs.ip_address }} --type ipAddress --projectId ${{ env.MONGODB_ATLAS_PROJECT_ID }} --comment  "Temporary access for GH Action"

在工作流程结束时:

    - name: Remove GH runner IP from MongDB access list
    shell: bash
    run: |
      atlas accessLists delete ${{ steps.get_gh_runner_ip.outputs.ip_address }} --projectId ${{ env.MONGODB_ATLAS_PROJECT_ID }} --force
lizmstanley
2024-06-07

您可以添加运行器在作业持续期间的 IP,而不是允许 任何 IP 访问集群:

  • 作为机密,使用项目所有者角色提供 API 凭据 请注意 ,您还必须允许任何 IP 访问 Atlas 管理 API 本身( docs )。

  • 发出请求,例如 https://checkip.amazonaws.com 找出特定运行器的公共 IP:

    - name: 获取运行器的公共 IP
    id: get-ip
    shell: bash
    run: |
    echo "ip-address=$(curl https://checkip.amazonaws.com)" >> "$GITHUB_OUTPUT"
    
  • 向 MongoDB Atlas API POST /groups/{groupId}/accessList 发出请求以允许访问该 IP:

    - name: 允许运行器访问 MongoDB Atlas
    id: allow-ip
    shell: bash
    run: |
    curl \
    --data '[{"ipAddress": "${{ steps.get-ip.outputs.ip-address }}", "comment": "GitHub Actions Runner"}]' \
    --digest \
    --header 'Accept: application/vnd.atlas.2023-02-01+json' \
    --header 'Content-Type: application/json' \
    --user "$USERNAME:$PASSWORD" \
    "https://cloud.mongodb.com/api/atlas/v2/groups/$GROUP_ID/accessList"
    env:
    GROUP_ID: ${{ secrets.ATLAS_GROUP_ID }}
    PASSWORD: ${{ secrets.ATLAS_PRIVATE_KEY }}
    USERNAME: ${{ secrets.ATLAS_PUBLIC_KEY }}
    
  • 访问完成后,如果成功 失败,向 DELETE /groups/{groupId}/accessList/{entryValue> 发出请求以撤销访问权限:

    - name: 撤销运行者对 MongoDB Atlas 的访问权限
    if: always() && steps.allow-ip.outcome == 'success'
    shell: bash
    run: |
    curl \
    --digest \
    --header 'Accept: application/vnd.atlas.2023-02-01+json' \
    --request 'DELETE' \
    --user "$USERNAME:$PASSWORD" \
    "https://cloud.mongodb.com/api/atlas/v2/groups/$GROUP_ID/accessList/${{ steps.get-ip.outputs.ip-address }}"
    env:
    GROUP_ID: ${{ secrets.ATLAS_GROUP_ID }}
    PASSWORD: ${{ secrets.ATLAS_PRIVATE_KEY }}
    USERNAME: ${{ secrets.ATLAS_PUBLIC_KEY }}
    

另一种方法(包括自动应用作业后步骤)是在 自定义 JavaScript 操作 (使用 runs.post 发出 DELETE 请求)。我已将此类操作 发布 到市场。

jonrsharpe
2024-03-06