Computing Reviews, the leading online review service for computing literature.

Search

Performance scalability of a multi-core Web server
Veal B., Foong A. Architecture for networking and communications systems (Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems, Orlando, Florida, USA, Dec 3-4, 2007)57-66.2007.Type:Proceedings

Date Reviewed: Feb 1 2008

The Internet provides a computing scenario where clients communicate with Web servers through mutually independent connections. If Internet server application processing and the associated protocol processing of a connection (flow) are done exclusively on a single central processing unit (CPU) core, minimal data sharing and synchronization between flows is expected. The computation ability of future servers will depend on increasing the number of cores. This interesting paper “identifies the major bottlenecks to scalability for a reference server workload on a commercial server platform.” To test their hypothesis, the authors “set up [a] test server running a well-tuned Apache HTTP server and [the] Linux operating system. The server had eight cores with pairs of cores sharing L2 cache.” The experiments show that the test server, running a modified SPECweb2005 Support workload, achieved only a 4.8 times speedup in throughput, compared to the ideal eight times--official SPECweb2005 results show similar scaling problems. This work provides “insights on the key causes of poor scalability of a Web server,” and also provides “the analysis methodology leading to these insights.” This latter feature makes the paper more interesting than the findings themselves, since the main bottleneck of the multicore server is the bus, and the snoopy protocol for sharing it. The authors determined that the main cause of poor scaling is the capacity of the bus. They confirmed that the address bus reached 77 percent utilization on eight cores, which is considered fully saturated. Other results showed that the number of cache misses remained nearly constant per byte as the number of cores increased, and that shared cache between cores on the same bus had little effect on performance. However, profiling revealed some scalability obstacles in software. “Increasing hash table capacities and reducing dependence on linked lists,” as workload increases, should fix these scalability problems. “In the kernel, flow-level parallelism broke down in the file-system directory cache,” which was widely shared. The authors propose that “a possible workaround would be to maintain alternate directory trees for each core.” In conclusion, the remaining problem in scaling performance with the number of cores is address bus capacity. As stated, “directories (and directory caches) can be used to replace snoopy cache coherence,” with paying the price of additional cost and latency. Further studies should be addressed to verify this last hypothesis for real workloads

Reviewer: Carlos Juiz	Review #: CR135213 (0812-1191)

Performance Attributes (C.4 ... )

Servers (C.5.5 )

Would you recommend this review?

yes

Other reviews under "Performance Attributes":	Date

Attributes of the performance of central processing units: a relative performance prediction model Ein-Dor P., Feldmesser J. Communications of the ACM 30(4): 308-317, 1987. Type: Article	Jul 1 1988

Performance estimation of computer communication networks: a structured approach Verma P., Computer Science Press, Inc., New York, NY, 1989. Type: Book (9789780716781837)	Jun 1 1990

Computer hardware performance: production and cost function analyses Kang Y. Communications of the ACM 32(5): 586-593, 1989. Type: Article	Feb 1 1990

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy