Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Performance scalability of a multi-core Web server
Veal B., Foong A.  Architecture for networking and communications systems (Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems, Orlando, Florida, USA, Dec 3-4, 2007)57-66.2007.Type:Proceedings
Date Reviewed: Feb 1 2008

The Internet provides a computing scenario where clients communicate with Web servers through mutually independent connections. If Internet server application processing and the associated protocol processing of a connection (flow) are done exclusively on a single central processing unit (CPU) core, minimal data sharing and synchronization between flows is expected. The computation ability of future servers will depend on increasing the number of cores. This interesting paper “identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform.”

To test their hypothesis, the authors “set up [a] test server running a well-tuned Apache HTTP server and [the] Linux operating system. The server had eight cores with pairs of cores sharing L2 cache.” The experiments show that the test server, running a modified SPECweb2005 Support workload, achieved only a 4.8 times speedup in throughput, compared to the ideal eight times--official SPECweb2005 results show similar scaling problems.

This work provides “insights on the key causes of poor scalability of a Web server,” and also provides “the analysis methodology leading to these insights.” This latter feature makes the paper more interesting than the findings themselves, since the main bottleneck of the multicore server is the bus, and the snoopy protocol for sharing it. The authors determined that the main cause of poor scaling is the capacity of the bus. They confirmed that the address bus reached 77 percent utilization on eight cores, which is considered fully saturated.

Other results showed that the number of cache misses remained nearly constant per byte as the number of cores increased, and that shared cache between cores on the same bus had little effect on performance. However, profiling revealed some scalability obstacles in software. “Increasing hash table capacities and reducing dependence on linked lists,” as workload increases, should fix these scalability problems. “In the kernel, flow-level parallelism broke down in the file-system directory cache,” which was widely shared. The authors propose that “a possible workaround would be to maintain alternate directory trees for each core.”

In conclusion, the remaining problem in scaling performance with the number of cores is address bus capacity. As stated, “directories (and directory caches) can be used to replace snoopy cache coherence,” with paying the price of additional cost and latency. Further studies should be addressed to verify this last hypothesis for real workloads

Reviewer:  Carlos Juiz Review #: CR135213 (0812-1191)
Bookmark and Share
  Reviewer Selected
 
 
Performance Attributes (C.4 ... )
 
 
Servers (C.5.5 )
 
Would you recommend this review?
yes
no
Other reviews under "Performance Attributes": Date
Attributes of the performance of central processing units: a relative performance prediction model
Ein-Dor P., Feldmesser J. Communications of the ACM 30(4): 308-317, 1987. Type: Article
Jul 1 1988
Performance estimation of computer communication networks: a structured approach
Verma P., Computer Science Press, Inc., New York, NY, 1989. Type: Book (9789780716781837)
Jun 1 1990
Computer hardware performance: production and cost function analyses
Kang Y. Communications of the ACM 32(5): 586-593, 1989. Type: Article
Feb 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy