ChatGPT解决这个技术问题 Extra ChatGPT

Encoding URL query parameters in Java

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.

There are two subtleties I'm not sure of:

Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20 Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.

Notes:

java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.

java.net.URI doesn't encode query parameters

This question looks useful: stackoverflow.com/questions/444112/…
the structure of the query part is server-dependent, though most expect application/x-www-form-urlencoded key/value pairs. See here for more: illegalargumentexception.blogspot.com/2009/12/…

B
Buhake Sindi

java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.

URLEncoder.encode(query, "UTF-8");

On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.


I mentioned that I didn't think that does url encoding, instead it encodes data to be submitted via a form. comments?
I ended up using URLEncoder.encode and replacing "+" with "%20"
It encodes slashes to "%2F", shouldn't it leave the URL slashes as they are?
@golimar No, it shouldn't. You are supposed to give it parameter value only and not the whole URL. Consider example http://example.com/?url=http://example.com/?q=c&sort=name. Should it encode &sort=name or not? There is no way to distinguish value from the URL. That is the exact reason why you need value encoding in the first place.
But actually, slash is a legal character in querystring parameter values.
C
Community

Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).

URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.

Best solution I could come up with:

return URLEncoder.encode(raw, "UTF-8").replaceAll("\\+", "%20");

If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...

EDIT: I had this code in here first which doesn't encode "?", "&", "=" properly:

//don't use - doesn't properly encode "?", "&", "="
new URI(null, null, null, raw, null).toString().substring(1);

+ is a perfectly valid encoding of a space.
@LawrenceDol it's true but sometimes + may be interpreted incorrectly - take a look at C# blogs.msdn.microsoft.com/yangxind/2006/11/08/…
This. I compared various alternatives against Javascript's encodeURIComponent method output, and this was the only exact match for the ones I tried (queries with spaces, Turkish and German special characters).
Ahmet+Mehmet Demir => Ahmet%2BMehmet+Demir , According to my understanding the only problem here is MIME type application/x-www-form-urlencoded. In such cases space is encoded to + char, if the intention was searching two entries in a web form, like google search by a GET request. URI RFC allows + char as a valid char. So, it doesn't need to be escaped normally.
C
Community

EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.

URIUtil of Apache httpclient is really useful, although there are some alternatives

URIUtil.encodeQuery(url);

For example, it encodes space as "+" instead of "%20"

Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.


I would have to agree. Use HttpClient, you will be much happier.
That look promising, got a link by chance? I'm googling but finding many.
This method doesn't seem to be present in HttpClient 4.1? hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/…
@Alex, hmm that's annoying, I've always used that routine with good results. One idea is to grab the source code from the 3 release since they now obviously didn't want to maintain it anymore.
URIUtil.encodeWithinQuery is what you would use an encode an individual query parameter, which is what the original question seemed to be asking.
C
Community

It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT

look to the URI specification for more details.


But doing so can change the meaning of the URI, since the interpretation of the query string is up to the server. If you are producing a application/x-www-form-urlencoded query string, either is fine. If you are fixing up a URL that the user typed/pasted in, : should be left alone.
@tc. You are right, if colon is being used as a general delimiter (page 12 of the RFC); however, if it is not being used as a general delimiter, then both encodings should resolve identically.
You also have to be careful as URLs are not really a subset of URI: adamgent.com/post/25161273526/urls-are-not-a-subset-of-uris
A colon is %3A not %3B (thats a semicolon), for anybody who is manually encoding
r
rfeak

The built in Java URLEncoder is doing what it's supposed to, and you should use it.

A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.

A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.

As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.

URLEncoder.encode(yourUrl, "UTF-8");

+ is only a representation of space in application/x-www-form-urlencoded; it is not guaranteed to work even when restricted to HTTP. Similarly, : is valid in a query string and should not be converted to %3B; a server can choose to interpret them differently.
this method also encode whole url slashes and other characters which are part e.g http:// to http%3A%2F%2F which is not correct
@ToKra you are not supposed to encode the http:// part. The method is for query parameters and encoded form data. If, however, you wanted to pass the URL of another website as a query parameter, THEN you would want to encode it to avoid confusing the URL parser.
@tc My reading of w3.org/TR/html4/interact/forms.html#h-17.13.3.3 is that all GET form data is encoded as application/x-www-form-urlencoded content type. Doesn't that mean is must work for HTTP?
a
aristotll

I just want to add anther way to resolve this problem.

If your project depends on spring web, you can use their utils.

import org.springframework.web.util.UriUtils

import java.nio.charset.StandardCharsets

UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)

Output:

vip%3A104534049%3A5


I
ICL Sales EXIMON
String param="2019-07-18 19:29:37";
param="%27"+param.trim().replace(" ", "%20")+"%27";

I observed in case of Datetime (Timestamp) URLEncoder.encode(param,"UTF-8") does not work.


J
Janisito

The white space character " " is converted into a + sign when using URLEncoder.encode. This is opposite to other programming languages like JavaScript which encodes the space character into %20. But it is completely valid as the spaces in query string parameters are represented by +, and not %20. The %20 is generally used to represent spaces in URI itself (the URL part before ?).


J
Jignesh Patel

if you have only space problem in url. I have used below code and it work fine

String url;
URL myUrl = new URL(url.replace(" ","%20"));

example : url is

www.xyz.com?para=hello sir

then output of muUrl is

www.xyz.com?para=hello%20sir