尝试/除了使用 Python requests 模块的正确方法？

C

Community

查看请求 exception docs。简而言之：

如果出现网络问题（例如 DNS 故障、连接被拒绝等），Requests 将引发 ConnectionError 异常。如果出现罕见的无效 HTTP 响应，Requests 将引发 HTTPError 异常。如果请求超时，则会引发超时异常。如果请求超过配置的最大重定向数，则会引发 TooManyRedirects 异常。 Requests 显式引发的所有异常都继承自 requests.exceptions.RequestException。

要回答您的问题，您所展示的内容不会涵盖您的所有基础。您只会捕获与连接相关的错误，而不是超时的错误。

捕获异常时该怎么做实际上取决于脚本/程序的设计。可以接受退出吗？你可以继续再试一次吗？如果错误是灾难性的并且您无法继续，那么可以，您可以通过引发 SystemExit 中止程序（打印错误和调用 sys.exit 的好方法）。

您可以捕获基类异常，它将处理所有情况：

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    raise SystemExit(e)

或者你可以分别捕捉它们并做不同的事情。

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    raise SystemExit(e)

正如 Christian 指出的：

如果您希望 http 错误（例如 401 Unauthorized）引发异常，您可以调用 Response.raise_for_status。如果响应是 http 错误，这将引发 HTTPError。

一个例子：

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise SystemExit(err)

将打印：

404 Client Error: Not Found for url: http://www.google.com/nothere

处理请求库的细节以及一般异常捕获的非常好的答案。

请注意，由于底层 urllib3 库中存在错误，如果您使用超时，您还需要捕获 socket.timeout 异常：github.com/kennethreitz/requests/issues/1236

未来的评论读者：这已在 Requests 2.9（捆绑 urllib3 1.13）中修复

如果您希望 http 错误（例如 401 Unauthorized）引发异常，您可以调用 Response.raise_for_status。如果响应是 http 错误，这将引发 HTTPError。

Request website 上的例外列表不完整。您可以阅读完整列表 here。

S

Sam

一项额外的建议是明确的。似乎最好从错误堆栈中的特定到一般，以获取要捕获的所需错误，因此特定错误不会被一般错误掩盖。

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah

对比

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

这也是 post 的有效语法吗？

@ScipioAfricanus 是的。

Max retries exceeded with url: 的例外情况是什么？我已将所有异常添加到异常列表中，但仍未处理。

@theking2 尝试 urllib3.exceptions.MaxRetryError 或 requests.exceptions.RetryError

@theking2 尝试 requests.ConnectionError，它会正常

t

tsh

异常对象还包含原始响应 e.response，如果需要查看服务器响应的错误正文，这可能很有用。例如：

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)

m

mike rodent

这是一种通用的处理方式，这至少意味着您不必用 try ... except 包围每个 requests 调用：

# see the docs: if you set no timeout the call never times out! A tuple means "max 
# connect time" and "max read time"
DEFAULT_REQUESTS_TIMEOUT = (5, 15) # for example

def log_exception(e, verb, url, kwargs):
    # the reason for making this a separate function will become apparent
    raw_tb = traceback.extract_stack()
    if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
        kwargs['data'] = f'{kwargs["data"][:500]}...'  
    msg = f'BaseException raised: {e.__class__.__module__}.{e.__class__.__qualname__}: {e}\n' \
        + f'verb {verb}, url {url}, kwargs {kwargs}\n\n' \
        + 'Stack trace:\n' + ''.join(traceback.format_list(raw_tb[:-2]))
    logger.error(msg) 

def requests_call(verb, url, **kwargs):
    response = None
    exception = None
    try:
        if 'timeout' not in kwargs:
            kwargs['timeout'] = DEFAULT_REQUESTS_TIMEOUT
        response = requests.request(verb, url, **kwargs)
    except BaseException as e:
        log_exception(e, verb, url, kwargs)
        exception = e
    return (response, exception)

注意

请注意内置的 ConnectionError，与类 requests.ConnectionError* 无关。我假设后者在这种情况下更常见，但没有真正的想法......在检查非 None 返回的异常时，所有请求异常（包括 requests.ConnectionError）的超类 requests.RequestException 不是“requests.RequestException”。 exceptions.RequestException”根据文档。自从接受答案以来，它可能已经发生了变化。**显然，这假设已经配置了一个记录器。在 except 块中调用 logger.exception 似乎是个好主意，但这只会在此方法中提供堆栈！相反，获取导致调用此方法的跟踪。然后记录（包含异常的详细信息以及导致问题的调用的详细信息）

*我查看了源代码：requests.ConnectionError 子类化单个类 requests.RequestException，子类化单个类 IOError（内置）

**但是，在撰写本文时（2022 年 2 月），您在 this page 的底部找到“requests.exceptions.RequestException”......但它链接到上述页面：令人困惑。

用法很简单：

search_response, exception = utilities.requests_call('get',
    f'http://localhost:9200/my_index/_search?q={search_string}')

首先，您检查响应：如果是 None，发生了一些有趣的事情，并且您将有一个异常，必须根据上下文（以及异常）以某种方式对其进行处理。在 Gui 应用程序 (PyQt5) 中，我通常实现一个“可视日志”以向用户提供一些输出（同时也记录到日志文件），但添加的消息应该是非技术性的。因此，通常可能会出现这样的情况：

if search_response == None:
    # you might check here for (e.g.) a requests.Timeout, tailoring the message
    # accordingly, as the kind of error anyone might be expected to understand
    msg = f'No response searching on |{search_string}|. See log'
    MainWindow.the().visual_log(msg, log_level=logging.ERROR)
    return
response_json = search_response.json()
if search_response.status_code != 200: # NB 201 ("created") may be acceptable sometimes... 
    msg = f'Bad response searching on |{search_string}|. See log'
    MainWindow.the().visual_log(msg, log_level=logging.ERROR)
    # usually response_json will give full details about the problem
    log_msg = f'search on |{search_string}| bad response\n{json.dumps(response_json, indent=4)}'
    logger.error(log_msg)
    return

# now examine the keys and values in response_json: these may of course 
# indicate an error of some kind even though the response returned OK (status 200)...

鉴于堆栈跟踪是自动记录的，您通常不需要更多...

但是，要跨越 Ts：

如果如上所述，异常给出消息“无响应”和非 200 状态“错误响应”，我建议

响应的 JSON 结构中缺少预期的键应导致消息“异常响应”

消息“意外响应”的超出范围或奇怪的值

以及消息“错误响应”中存在诸如“错误”或“错误”之类的键，其值为 True 或其他值

这些可能会或可能不会阻止代码继续。

......事实上，在我看来，让这个过程更加通用是值得的。对我来说，这些下一个函数通常将使用上述 requests_call 的 20 行代码减少到大约 3 行，并使您的大部分处理和日志消息标准化。在您的项目中进行了多次 requests 调用，代码变得更加美观且不那么臃肿：

def log_response_error(response_type, call_name, deliverable, verb, url, **kwargs):
    # NB this function can also be used independently
    if response_type == 'No': # exception was raised (and logged)
        if isinstance(deliverable, requests.Timeout):
            MainWindow.the().visual_log(f'Time out of {call_name} before response received!', logging.ERROR)
            return    
    else:
        if isinstance(deliverable, BaseException):
            # NB if response.json() raises an exception we end up here
            log_exception(deliverable, verb, url, kwargs)
        else:
            # if we get here no exception has been raised, so no stack trace has yet been logged.  
            # a response has been returned, but is either "Bad" or "Anomalous"
            response_json = deliverable.json()

            raw_tb = traceback.extract_stack()
            if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
                kwargs['data'] = f'{kwargs["data"][:500]}...'
            added_message = ''     
            if hasattr(deliverable, 'added_message'):
                added_message = deliverable.added_message + '\n'
                del deliverable.added_message
            call_and_response_details = f'{response_type} response\n{added_message}' \
                + f'verb {verb}, url {url}, kwargs {kwargs}\nresponse:\n{json.dumps(response_json, indent=4)}'
            logger.error(f'{call_and_response_details}\nStack trace: {"".join(traceback.format_list(raw_tb[:-1]))}')
    MainWindow.the().visual_log(f'{response_type} response {call_name}. See log.', logging.ERROR)
    
def check_keys(req_dict_structure, response_dict_structure, response):
    # both structures MUST be dict
    if not isinstance(req_dict_structure, dict):
        response.added_message = f'req_dict_structure not dict: {type(req_dict_structure)}\n'
        return False
    if not isinstance(response_dict_structure, dict):
        response.added_message = f'response_dict_structure not dict: {type(response_dict_structure)}\n'
        return False
    for dict_key in req_dict_structure.keys():
        if dict_key not in response_dict_structure:
            response.added_message = f'key |{dict_key}| missing\n'
            return False
        req_value = req_dict_structure[dict_key]
        response_value = response_dict_structure[dict_key]
        if isinstance(req_value, dict):
            # if the response at this point is a list apply the req_value dict to each element:
            # failure in just one such element leads to "Anomalous response"... 
            if isinstance(response_value, list):
                for resp_list_element in response_value:
                    if not check_keys(req_value, resp_list_element, response):
                        return False
            elif not check_keys(req_value, response_value, response): # any other response value must be a dict (tested in next level of recursion)
                return False
        elif isinstance(req_value, list):
            if not isinstance(response_value, list): # if the req_value is a list the reponse must be one
                response.added_message = f'key |{dict_key}| not list: {type(response_value)}\n'
                return False
            # it is OK for the value to be a list, but these must be strings (keys) or dicts
            for req_list_element, resp_list_element in zip(req_value, response_value):
                if isinstance(req_list_element, dict):
                    if not check_keys(req_list_element, resp_list_element, response):
                        return False
                if not isinstance(req_list_element, str):
                    response.added_message = f'req_list_element not string: {type(req_list_element)}\n'
                    return False
                if req_list_element not in response_value:
                    response.added_message = f'key |{req_list_element}| missing from response list\n'
                    return False
        # put None as a dummy value (otherwise something like {'my_key'} will be seen as a set, not a dict 
        elif req_value != None: 
            response.added_message = f'required value of key |{dict_key}| must be None (dummy), dict or list: {type(req_value)}\n'
            return False
    return True

def process_json_requests_call(verb, url, **kwargs):
    # "call_name" is a mandatory kwarg
    if 'call_name' not in kwargs:
        raise Exception('kwarg "call_name" not supplied!')
    call_name = kwargs['call_name']
    del kwargs['call_name']

    required_keys = {}    
    if 'required_keys' in kwargs:
        required_keys = kwargs['required_keys']
        del kwargs['required_keys']

    acceptable_statuses = [200]
    if 'acceptable_statuses' in kwargs:
        acceptable_statuses = kwargs['acceptable_statuses']
        del kwargs['acceptable_statuses']

    exception_handler = log_response_error
    if 'exception_handler' in kwargs:
        exception_handler = kwargs['exception_handler']
        del kwargs['exception_handler']
        
    response, exception = requests_call(verb, url, **kwargs)

    if response == None:
        exception_handler('No', call_name, exception, verb, url, **kwargs)
        return (False, exception)
    try:
        response_json = response.json()
    except BaseException as e:
        logger.error(f'response.status_code {response.status_code} but calling json() raised exception')
        # an exception raised at this point can't truthfully lead to a "No response" message... so say "bad"
        exception_handler('Bad', call_name, e, verb, url, **kwargs)
        return (False, response)
        
    status_ok = response.status_code in acceptable_statuses
    if not status_ok:
        response.added_message = f'status code was {response.status_code}'
        log_response_error('Bad', call_name, response, verb, url, **kwargs)
        return (False, response)
    check_result = check_keys(required_keys, response_json, response)
    if not check_result:
        log_response_error('Anomalous', call_name, response, verb, url, **kwargs)
    return (check_result, response)

示例调用：

success, deliverable = utilities.process_json_requests_call('get', 
    f'{ES_URL}{INDEX_NAME}/_doc/1', 
    call_name=f'checking index {INDEX_NAME}',
    required_keys={'_source':{'status_text': None}})
if not success: return False
# here, we know the deliverable is a response, not an exception
# we also don't need to check for the keys being present
index_status = deliverable.json()['_source']['status_text']
if index_status != 'successfully completed':
    # ... i.e. an example of a 200 response, but an error nonetheless
    msg = f'Error response: ES index {INDEX_NAME} does not seem to have been built OK: cannot search'
    MainWindow.the().visual_log(msg)
    logger.error(f'index |{INDEX_NAME}|: deliverable.json() {json.dumps(deliverable.json(), indent=4)}')
    return False

因此，例如，在缺少键“status_text”的情况下，用户看到的“可视日志”消息将是“异常响应检查索引 XYZ。请参阅日志”。（并且日志将显示有问题的密钥）。

注意

强制kwarg：call_name；可选的 kwargs：required_keys、acceptable_statuses、exception_handler。

required_keys 字典可以嵌套到任何深度

可以通过在 kwargs 中包含一个函数 exception_handler 来完成更细粒度的异常处理（尽管不要忘记 requests_call 将记录调用详细信息、异常类型和 __str__ 以及堆栈跟踪）。

在上面，我还对可能记录的任何 kwargs 中的关键“数据”进行了检查。这是因为批量操作（例如，在 Elasticsearch 的情况下填充索引）可能包含大量字符串。例如，减少到前 500 个字符。

PS 是的，我确实知道 elasticsearch Python 模块（requests 周围的“薄包装”）。以上所有内容仅用于说明目的。

尝试/除了使用 Python requests 模块的正确方法？

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们