ChatGPT解决这个技术问题 Extra ChatGPT

Get raw POST body in Python Flask regardless of Content-Type header

Previously, I asked How to get data received in Flask request because request.data was empty. The answer explained that request.data is the raw post body, but will be empty if form data is parsed. How can I get the raw post body unconditionally?

@app.route('/', methods=['POST'])
def parse_request():
    data = request.data  # empty in some cases
    # always need raw data here, not parsed form data

d
davidism

Use request.get_data() to get the raw data, regardless of content type. The data is cached and you can subsequently access request.data, request.json, request.form at will.

If you access request.data first, it will call get_data with an argument to parse form data first. If the request has a form content type (multipart/form-data, application/x-www-form-urlencoded, or application/x-url-encoded) then the raw data will be consumed. request.data and request.json will appear empty in this case.


This seems to break when using raven-python (Sentry), bug and workarounds here: github.com/getsentry/raven-python/issues/457
Thanks. This really saved the day. It's necessary when you need to parse raw request data manually. Especially when the request is multipart/form-data.
d
davidism

request.stream is the stream of raw data passed to the application by the WSGI server. No parsing is done when reading it, although you usually want request.get_data() instead.

data = request.stream.read()

The stream will be empty if it was previously read by request.data or another attribute.


Thanks for pointing out that the stream will be empty if read through request.data before! Almost got me during debugging
d
davidism

I created a WSGI middleware that stores the raw body from the environ['wsgi.input'] stream. I saved the value in the WSGI environ so I could access it from request.environ['body_copy'] within my app.

This isn't necessary in Werkzeug or Flask, as request.get_data() will get the raw data regardless of content type, but with better handling of HTTP and WSGI behavior.

This reads the entire body into memory, which will be an issue if for example a large file is posted. This won't read anything if the Content-Length header is missing, so it won't handle streaming requests.

from io import BytesIO

class WSGICopyBody(object):
    def __init__(self, application):
        self.application = application

    def __call__(self, environ, start_response):
        length = int(environ.get('CONTENT_LENGTH') or 0)
        body = environ['wsgi.input'].read(length)
        environ['body_copy'] = body
        # replace the stream since it was exhausted by read()
        environ['wsgi.input'] = BytesIO(body)
        return self.application(environ, start_response)

app.wsgi_app = WSGICopyBody(app.wsgi_app)
request.environ['body_copy']

d
davidism

request.data will be empty if request.headers["Content-Type"] is recognized as form data, which will be parsed into request.form. To get the raw data regardless of content type, use request.get_data().

request.data calls request.get_data(parse_form_data=True), which results in the different behavior for form data.