1. Several key concepts of the Internet
You can think of the Internet as a super complex multi-layer routing network. The key players are:
1. Terminal: the mobile phone or computer in your hand.
2. LAN/Router: Home Wi-Fi, company intranet, intranet address allocation (NAT), simple firewall.
3. Operator Network (ISP): China Mobile, China Telecom, China Unicom... They are large "road operators" that help you send data to the larger Internet.
4. Autonomous System (AS) & BGP:
* Each operator/large organization has its own "autonomous system number" (AS);
* ASs use **BGP routing protocol** to tell each other "which IP network segments can be reached from my side".
5. Backbone Network: Many ASs are strung together to form a national/regional backbone network; then interconnected, it becomes the global Internet.
6. DNS: Responsible for translating names like www.example.com into IP addresses.
7. Application layer protocols: HTTP(S), SMTP, WebSocket, QUIC, etc., run specific services on the established "pipeline".
2. Where is the "wall" approximately?
In academia, it is generally called: Great Firewall/GFW, "Great Firewall". It is not a certain machine, but a complete system deployed on the border of China's backbone network:
* The location is roughly on the line of "Domestic Operator Backbone → International Export";
* Some traffic is also processed within the domestic operator network. (net.in.tum.de)
You can first think of it as:
Between the "domestic network" and the "foreign network", there is a circle of super powerful firewall + monitoring system. All traffic "in and out of the country" must go through these "border lines".
3. What is the "wall" doing?
There are actually just a few core goals: filtering, blocking, and monitoring some content. The main technical methods are roughly as follows:
1. IP Blocking / Route Blocking
* For some IP segments, "black hole processing" is performed directly on the border router:
* Either the routing table is directly blocked;
* Either there is no response after receiving the package, it is like falling into a black hole.
* Effect: Accessing these IPs (such as some foreign websites) is like "the server is down".
2. DNS Pollution / DNS Hijacking
For many blocked websites, the most typical phenomena are:
ping facebook.comgets a strange IP, or it cannot be resolved at all.
Actually it is:
* When you send a DNS query in the country:
1. When the data packet passes through the backbone network, the GFW device will **listen to UDP port 53** (DNS traffic);
2. If it is found that the query is a "sensitive domain name", it will **preemptively forge a fake DNS response** and send it back;
3. The real DNS response is still on the way, but you have already received the fake result, and the system has cached the fake IP.
That’s why it’s called DNS Pollution/DNS Injection, because it’s not that your local DNS server is deliberately deceiving people, but it was “intruded” on the road.
3. URL / keyword filtering (HTTP and SNI)
For HTTP/HTTPS traffic, GFW does something like DPI (Deep Packet Inspection):
1. Clear text HTTP:
* Directly check whether there are sensitive keywords in your URL, header, and content;
* After discovery, you can:
* Directly disconnect (TCP RST);
* Or return a false page/error page;
* Or your connection will be "dragged packet loss" for a period of time.
2. HTTPS (encrypted content):
* Although you cannot see the specific path you accessed `/xxx`, you can see:
* Which IP do you want to connect to;
* **SNI (Server Name Indication) in TLS handshake = domain name**;
* Therefore, domain names in SNI will be filtered.
* After the emergence of **encrypted SNI / ECH / QUIC** in new protocols, GFW has been continuously upgrading and trying Blocking in recent years.
4. Behavioral characteristic recognition/QoS rate-limited blocking
GFW is also doing one thing: slowing down or dropping packets for traffic suspected of bypassing tools based on connection behavior.
The general logic can be compared to:
* It copies your traffic to an "analysis system" and uses various feature models to judge:
* Does this look like a VPN/proxy/tunneling protocol?
* Does it look like a known bypass tool pattern?
* If you feel "suspicious", score the destination IP/port and implement the following for subsequent connections:
* Improve packet loss rate;
* Improve latency;
* Causes you to "timeout" and "disconnect", and use the poor experience to achieve "blocking".
5. Active Probing
There is another trick called active detection
* When it is discovered that an overseas server is suspected of running a "bypass tool",
* GFW will disguise itself as a normal user and connect to this server for testing:
* If it is confirmed that you are running a blacklisted protocol/service,
* It is possible to block this IP/port more permanently.
4. Why can it be detected?
Because almost all inbound and outbound traffic passes through a few "exit gateways" - routers/switches/fiber optic equipment at these gateways can be connected to the monitoring system.
The path for domestic access to foreign websites is:
Your computer → Home router → Local operator → Provincial/regional backbone → National backbone / International export router → Foreign operator → Foreign website
The key point is this step:
Domestic operator backbone network ↔ Foreign operator backbone network The docking point in the middle is called "International Export", which can be:
*International Communications Bureau room/border gateway router on land
* Landing station equipment leading to submarine optical cables
* Border optical cable nodes interconnecting with neighboring countries
Because most outbound traffic goes through these points, As long as you do "copy + analysis" at these points, you can monitor/filter the inbound and outbound traffic.
The Internet itself is interconnected by a bunch of operators/autonomous systems, so if you want domestic users to access anything abroad, they will eventually have to connect to other people's networks.
5. What about the ladder?
The complete string is actually this:
1. When there is no ladder
* You directly outsource from the domestic network to various overseas websites;
* All paths in the middle are "official operators ↔ foreign operators" connection;
* The middle boundary device has enough information for fine control.
2. When there is a ladder
* You should go to the international export through the domestic operator first;
* It’s just that the direct target becomes a specific “transit server” ** (the network where this machine is located has been connected to foreign operators, and many circles are needed to reach this transit server);
* What actually accesses various websites everywhere is the relay, not the connection you make directly from the domestic end;
* What the egress device can see is the encrypted traffic of "you ↔ that transit server", not the plaintext details of you ↔ each website.
5. Then why are there good ladders? Trash ladder?
Distance and roads.
If you are in southern China, even places that are close by like Hong Kong and Japan, China Telecom/China Unicom/China Mobile have relatively straight lines going there, and where the speed of light is, the delay will naturally be low; but if you are connected to Europe or the middle of the United States, it may go around half the world, and pass through a bunch of crowded operators and submarine cables on the way, so delay and jitter will increase.
Coupled with the fact that they are also "Japanese nodes" and "American nodes", the operators and export quality behind them may be completely different. Some computer rooms have spent money to buy better international bandwidth, and the upstream is a large operator; some are the cheapest lines that "just work". During peak hours, everyone is crowded, and watching videos here becomes like a PPT.
There's another issue: allocated bandwidth.
For example, there is only 100M of actual available bandwidth behind a server. If it is used by only a dozen people, everyone will be happy; if it is forced to use hundreds of people, and everyone is playing videos and downloading large files, the experience will definitely be avalanche. Many so-called "junk nodes" are actually either poor in technology or have too many online recognitions.
In addition, some lines and certain countries/protocols may receive "special care" when crossing borders - they are prone to speed limits and queues. So you will see the same "American node" written on it. One is ridiculously stable at 200ms, and the other is 200ms and 2000ms for a while, shaking like an electrocardiogram, which means it takes a completely different path.
6. Detouring the line
I mentioned the saying going around in many circles earlier. In fact, it means: instead of taking the default cross-border route, take a turn first and find a road that is easier to walk or less likely to be tossed.
The most typical approach is multi-hop transfer. For example, it was originally: you → domestic operator → direct cross-border → Japanese node. "Go around" becomes: you → a domestic transfer → through a better quality cross-border route → Japanese export plane → target website.
It seems like an extra step. In fact, the most critical right of way "out of the country" is given to a more reliable highway instead of just taking the default national highway.
But in fact, the purpose of "detouring" is not to be faster, but to be more stable and less conspicuous: the original traffic looks too much like "a certain characteristic" and is easily singled out for speed restrictions or interference, so we try to find a way to stuff it into some kind of more ordinary-looking traffic. This already involves things like protocol confusion and disguise, so I won’t go into details.