ChatGPT解决这个技术问题 Extra ChatGPT

Is there any benefit to adding accept-charset="UTF-8" to HTML forms, if the page is already in UTF-8?

For pages already specified (either by HTTP header, or by meta tag), to have a Content-Type with a UTF-8 charset... is there a benefit of adding accept-charset="UTF-8" to HTML forms?

(I understand the accept-charset attribute is broken in IE for ISO-8859-1, but I haven't heard of a problem with IE and UTF-8. I'm just asking if there's a benefit to adding it with UTF-8, to help prevent invalid byte sequences from being entered.)

My question is more specific... but related: stackoverflow.com/questions/3715264/… and stackoverflow.com/questions/1317152/…
Related W3C reference: w3.org/TR/html401/interact/forms.html#adef-accept-charset (note the "may" in User agents may interpret this value as the character encoding that was used to transmit the document - does this mean it's safer to explicitly mention it? Not sure. From my experience, I agree with what @elusive says)

D
Darryl Hein

If the page is already interpreted by the browser as being UTF-8, setting accept-charset="utf-8" does nothing.

If you set the encoding of the page to UTF-8 in a <meta> and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.

In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.

So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding (possibly messing up more on your page than just the form).

Personally, I don't bother.


Are you sure? That makes sense but the doc says may interpret and that the default is UNKNOWN.
On all browsers (now and historically), UNKNOWN/unset always means the current page encoding, whether that was the server's page encoding set in a header/meta, or the encoding explicitly set by the user as an override. Exception that probably doesn't affect you: most browsers will not send form submissions in a non-ASCII-superset encoding like UTF-16 even if the page was served as that. It doesn't really make sense to do so.
j
jwueller

I did not encounter any problems using UTF-8 with IE (6+) or any other major browser out there. You need to make sure, that a UTF-8 meta tag is set (IE needs this) and that all your files are UTF-8 encoded (which means that the webserver sends UTF-8 headers). Then there should not be any problem if you omit accept-charset.


I'm doing those things, sans that form attribute, I'm getting some cases of invalid UTF-8 being input (stackoverflow.com/questions/3715264/…), so I'm trying to find out conclusively if adding this to all my forms will be helpful or unnecessary.
@philfreo: I never used it once and had no problems at all. Can you hand us a link to your page?
If your page is really being properly served as UTF-8, you shouldn't get non-UTF-8 submissions from that form. Of course, if you've got other sites embedding a form that points to your site, or automated agents submitting content in general, all bets are off.
Our server is serving all pages as UTF-8, and we aren't (intentionally) receiving data from other sources. We aren't getting a lot of invalid UTF-8, but we do get some every once in a while. As my other question indicates, looking for an overall approach to solving that. This question I was hoping to hear conclusively whether the accept-charset attribute was necessary (made any difference) given a UTF-8 http header.