开发者问题收集

TypeError:__init__() 收到意外的关键字参数“wait_on_rate_limit”

2016-06-29
9679

我试图收集有关纬度和经度的 Twitter 数据,但碰巧遇到了错误。

我试图避免推文数量限制以及抓取的时间限制。

代码:

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time

reload(sys)
sys.setdefaultencoding('utf8')

ckey = 'XYZ'
csecret = 'XYZ'
atoken = 'XYZ'
asecret = 'XYZ'

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret, 'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)
else:
    print " Scraping data now" # Enter lat and long and radius in Kms  q='hello'
    cursor = tweepy.Cursor(api.search,geocode="55.0000,4.0000,1000km",since= '2016-06-27',until='2016-06-28',lang='en',count=100)
    results=[]
    for item in cursor.items(1000): # Remove the limit to 1000
            results.append(item)

def toDataFrame(tweets):
    # COnvert to data frame
    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
    DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
    DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
    DataSet['Language'] = [tweet.user.lang for tweet in tweets]
    tweets_place= []
    #users_retweeted = []
    for tweet in tweets:
        if tweet.place:
            tweets_place.append(tweet.place.full_name)
        else:
            tweets_place.append('null')
    DataSet['TweetPlace'] = [i for i in tweets_place]
    #DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]

    return DataSet

DataSet = toDataFrame(results)
DataSet.to_csv('Belgium_27.csv',index=False)

错误:

Traceback (most recent call last):
  File "CS.py", line 23, in <module>
    api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
TypeError: __init__() got an unexpected keyword argument 'wait_on_rate_limit'

为了解决错误并收集推文,需要进行哪些更改?

编辑一 tweepy 升级后,我收到以下警告,程序自动终止

Scraping data now
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning

编辑二 :在代码中更改写入语句的缩进后。程序执行但仅返回空的 CSV

2个回答

删除 wait_on_rate_limit_notify=True 后,它对我来说工作正常。

文档中没有提到这样的参数:

class tweepy.API(auth=None, *, cache=None, host='api.twitter.com', parser=None, proxy=None, retry_count=0, retry_delay=0, retry_errors=None, timeout=60, upload_host='upload.twitter.com', user_agent=None, wait_on_rate_limit=False)

请参阅 文档

Kritika Ranjan
2021-11-05
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import Stream
import pandas as pd
import json
import csv
import sys
import time


ckey = '***'
csecret = '***'
atoken = '***'
asecret = '***'

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret, 'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])

api = tweepy.API(auth)
if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)
else:
    print ("Scraping data now") # Enter lat and long and radius in Kms  q='hello'
    cursor = tweepy.Cursor(api.search_tweets, q="truffles", result_type="new", geocode="55.0000,4.0000,1000km",lang='en',count=100)
    results=[]
    for item in cursor.items(1000): # Remove the limit to 1000
            results.append(item)

def toDataFrame(tweets):
    # COnvert to data frame
    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
    DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
    DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
    DataSet['Language'] = [tweet.user.lang for tweet in tweets]
    tweets_place= []
    #users_retweeted = []
    for tweet in tweets:
        if tweet.place:
            tweets_place.append(tweet.place.full_name)
        else:
            tweets_place.append('null')
    DataSet['TweetPlace'] = [i for i in tweets_place]
    #DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]

    return DataSet

DataSet = toDataFrame(results)
DataSet.to_csv('Belgium_27.csv',index=False)

您指定的时间段似乎不再适用于 tweepy。

结果类型 指定您希望收到的搜索结果类型。当前默认值为“混合”。有效值包括:

  • 混合:在响应中同时包含热门结果和实时结果

  • 最近:仅返回响应中最新的结果

  • 热门:仅返回响应中最受欢迎的结果 count |count| 直到 返回在给定日期之前创建的推文。日期 应格式化为 YYYY-MM-DD。

请记住,搜索索引有 7 天 的限制。换句话说,不会找到超过一周的推文。since_id |since_id| 通过 API 可以访问的推文数量是有限制的。如果自 since_id 以来推文数量已达到限制,则 since_id 将被强制为最旧的可用 ID。max_id |max_id| include_entities|include_entities|

Liam Fagan
2021-12-19