Wednesday, November 19, 2014

Testing C10M on a Host with Tornado

Here's my code that I've used, wanna share it with pleasure :)
https://github.com/kenial/tornado-test-c10m

Recently, I've tried to make out to test >1M concurrent connections on my PC, by using Tornado (FYI, Tornado is Python based web framework that supports both TCP / WebSocket connections). As you know, there has been a lot of C10K problem discussions, so you can easily find out articles about tuning of network and performance. (just try to google 'linux c10k tuning,' then you gonna get) Here are few useful links for me to get comprehension about C10K tuning :

- The C10K problem
http://www.kegel.com/c10k.html
- Performance Tuning the Network Stack on Mac OS X Part 2
https://rolande.wordpress.com/2014/05/17/performance-tuning-the-network-stack-on-mac-os-x-part-2/ 
- Linux TCP/IP tuning for scalability
http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/
- RED HAT ENTERPRISE LINUX 7 PERFORMANCE TUNING GUIDE (CentOS compatible)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Performance_Tuning_Guide/index.html 
- The C10M(!) problem
http://c10m.robertgraham.com/p/manifesto.html


A caveat here, it requires really vast memory. There is also a trick for >65K connections on loopback network. By using local address and remote address, OS identifies a socket. If you have multiple clients connect to your server that listens 8080 port, created sockets will be as followings:


local address
remote address
192.168.0.10:20001
192.168.0.10:8080
192.168.0.10:20002
192.168.0.10:8080
192.168.0.10:20003
192.168.0.10:8080
192.168.0.10:20004
192.168.0.10:8080

But range of local ports are between 0~65,535, so you can make 65K connections at maximum with one server port, even in an ideal case. In fact, the number could be less.

This could be resolved by letting server listens to multiple ports, like this:



local address
remote address
192.168.0.10:20001
192.168.0.10:8080
192.168.0.10:20001
192.168.0.10:8081
192.168.0.10:20002
192.168.0.10:8080
192.168.0.10:20002
192.168.0.10:8081

As above, if your server listens to multiple ports, then we can have multiple sockets that has a same local address and different remote addresses. With 200 listening ports and 50K local ports, it ends up with 200 * 50,000 = 10,000,000 (10M) connections! (FYI, you can assign multiple IPs instead of multiple listening ports, but it's little cumbersome - you have to do it manually, and need to restore them on some day)

One caveat more: this works on Ubuntu Server 14.10 (my development environment is) for sure, but not for OS X Yosemite. I'm not sure for the reason, but according to result of netstat, sockets on Yosemite seem to be identified by only local address. There no duplicates amongst local addresses as long as I looked into.


Okay, last but not least, here's the result:

- The host that runs test code is Ubuntu Server 14.10, which hosted by Amazon EC2's r3.4xlarge instance that has 122GiB memory.

- 10.5KB per TCP conn, 15.5KB per WebSocket conn.

- If 1M connections, it will be 21GiB for TCP connections and 31GiB for WebSocket connections.
  (211GiB for 10M!)

- PyPy is helpful for performance, especially in establishing connections.



No comments:

Post a Comment