ChatGPT解决这个技术问题 Extra ChatGPT

how does http proxy work?

I searched the web for something about http-proxy. I read wiki-articles about proxy server. But I still don't understand how http proxy works, stupid me.

Here is my assumption about how http proxy works: If I set the http-proxy to a specific one, say Proxy_A, then when I start up the chrome/IE, type in a specific URL, say URL_A, does the chrome/IE send the request directly to Proxy_A, then the Proxy_A sends the request to the real server of URL_A?

Yes, mostly then Proxy_A sends the response back to you. Pretty simple eh
This question is barely relevant to stackoverflow.
If your question is still unanswered, maybe provide more details what you need to know ?
even if the client is just doing HTTP GET - if the server/backend is secured (SSL), then client must initiate the connection by doing HTTP CONNECT method first to establish the connection first with the backend - due to encryption. then comes the other HTTP methods like the GET.

J
James Wierzba

A HTTP proxy speaks the HTTP protocol, it's especially made for HTTP connections but can be abused for other protocols as well (which is kinda standard already)

The browser (CLIENT) sends GET http://SERVER/path HTTP/1.1 to the PROXY
Now the PROXY will forward the actual request to the SERVER.
The SERVER will only see the PROXY as connection and answer to the PROXY just like to a CLIENT.
The PROXY receives the response and forwards it back to the CLIENT.

It is a transparent process and nearly like directly communicating with a server so it's just a tiny overhead for the browser to implement a HTTP proxy. There are some additional headers that can be sent to identify the client, reveal that he's using a proxy. Proxies sometimes change/add content within the data stream for various purposes. Some proxies for example include your real IP in a special HTTP HEADER which can be logged server-side, or intercepted in their scripts.

CLIENT <---> PROXY <---> SERVER

Update: Related to using proxies as a security/privacy feature As you can see in the ascii above, there is no direct communication between CLIENT and SERVER. Both parties just talk to the PROXY between them. In modern worlds the CLIENT often is a Browser and the SERVER often is a Webserver (Apache for example).

In such an environment users often trust the PROXY to be secure and not leak their identity. However there are many possible ways to ruin this security model due to complex software frameworks running on the browser. For example Flash or Java applets are a perfect example how a proxy connection can get broken, Flash and Java both might not care much about the proxy settings of their parent application (browser). Another example are DNS requests which can reach the destination nameserver without PROXY depending on the PROXY and the application settings. Another example would be cookies or your browser meta footprint (resollution, response times, user-agent, etc.) which might both identify you if the webserver knows you from the past already (or meets you again without proxy).

And in the end, the proxy itself needs to be trusted as it can read all the data that goes through it and on top it might even be able to break your SSL security (read up on man in the middle)

Where to get proxies from Proxies can be bought as a service, scanned for or simply run by yourself.

Public proxies
These are the most often used proxies and the usual term "public" is quite misleading.
The better term would be "open proxies". If you run a proxy server without firewall or authentication anyone in the world can find it and abuse it.
The large majority of companies selling proxies just scan the internet for such proxies or they use hacked windows computers (botnets) and sell them for mostly illegal/spam activity.
Most modern countries can see the use of an open proxy without authorization as abuse, it's a very common thing but can actually lead to prison time.
It's possible to scan for proxies by searching the internet for open ports, a typical free program would be https://nmap.org
As a word of caution: Larger scaled scanning will almost certainly get your internet connection banned by your ISP.

Paid proxies Here we have 4 types of proxies: 1) Paid public (open) proxies Basically these sellers sell or resell huge lists of proxies that are regularly refreshed to remove dead ones. The proxies are abused on a massive scale and usually blacklisted on most sites, including Google. Additional those proxies are usually very unstable and very slow. The large majority of these proxies are simply abusing wrongly configured servers. It's a very competitive "market", Google will lead to many examples.

2) Paid hacked (botnet) proxies These are abusing computers, mostly internet-of-things or windows desktops as proxy hosts. The attackers use them in large scale for various illegal purposes. Sellers usually call them "residential proxies" to hide the illegal nature of them. Using such a proxy is without doubt illegal and the abused user can easily log "your" IP if you connect to it, including the possibility to hijack your connection to the destination. Depending on the source those IPs are not blacklisted, so the "quality" is much better than public proxies.

3) Paid shared proxies These are datacenter proxies, usually legal and potential with a fast uplink. Due to the fact that there is so much e-commerce spam going on those IPs are massively abused and usually found in blacklists. A typical use would be circumvention of craigslist restrictions or geo-restrictions.

4) Paid private/dedicated proxies
"private" means dedicated. If the operator is professional it means your proxy is not shared among other people.
These are often used for more professional and legal activity, especially when the proxy IP is rented for alonger period.
A well known operator would be https://us-proxies.com

Own proxies
Running an own proxy is possible as well, there are various open-source projects available.
The mostly used proxy server is https://squid-cache.org


So how is the original URL send to the proxy?
@John - if the SERVER sees the Proxy as the "client", then how are things like cookies handled? In otherwords, what prevents the cookies from being dropped on the proxy vs on the actual CLIENT?
The proxy is just transmitting the data, cookies are part of the HTTP header so it's transmitted just like any other meta data. Everything the browser sends to the proxy is passed over to the destination and everything from the destination is passed back to the proxy. (Let's ignore the fact that some proxies can modify data)
Hi John (or any other future visitor), I have a question: so IIUC, a proxy server hides the client's true ID from the recipient server (i.e. the server of host.tld), but it cannot hide it from the ISP's server right? If so, is there a way to hide from the ISP?
Had same question as @edwin. Had to re-read several times. Maybe what confused me was in the example I thought "host.tld" was PROXY when in fact it was SERVER. The answer is accurate, but for uninitiated could use some refinement so a single read through would be sufficient. Possibly as simple as removing the host.tld reference e.g. "sends something like GET http://SERVER/path HTTP/1.1 to the PROXY". Or ... something.
C
Community

To add to John's great answer above, one important step is the initial CONNECT handshake between PROXY and CLIENT. From the Websocket RFC

CONNECT example.com:80 HTTP/1.1
Host: example.com

This is the same request that a CLIENT uses to open an SSL tunnel, which essentially uses a proxy


I feel like this is the complex part of the answer and the accepted answer is rather trivial without this. You've answered the important bit: how does the proxying happen. Or at least you've give a lead to more information.
No actually that's not true Ivan trik just described the CONNECT method but the question was about http-proxies. CONNECT is used to open a unviversal TCP/IP tunnel while GET as you'll find in my answer is used for a specific tunnel. HTTP. For SSL you need a raw tunnel due to encryption, that's why CONNECT is being used.
Thanks, without this part explained, we would ask, in https how can the proxy server know the desired destination if every thing is encrypted! that is why a CONNECT request in plain text is first sent to the proxy before the browser starts to speak to the website, while the proxy is just forwarding encrypted data between them.
Any example for request to a website? I can't find it. I tested many times but proxy server only response "HTTP/1.1 200 OK"
@John Is it possible to make use of HTTP CONNECT for plain HTTP requests as well?