CSC 724: Advanced Distributed Systems
Spring 2026
Credits: 3
Meeting Times: Tuesday/Thursday, 11:45am – 1pm
Meeting Location: Fitts-Woolard Hall 02121
Moodle
Piazza
Instructor Information
- Xiaohui (Helen) Gu
- Office Hours: Tuesday/Thursday 3:30pm-4pm
- Email : xgu AT ncsu.edu
Teaching Assistant
- Tural Mehtiyev
- Office Hours : Friday 2:00 pm-3 pm
- Email : tmehtiy AT ncsu.edu
Course Objectives
This course explores design and implementation principles in modern distributed systems. In particular, the course will emphasize on recent techniques used by real-world distributed systems such as cloud systems, enterprise data center, and peer-to-peer file sharing (e.g., BitTorrent). Students will learn the state of the art in distributed system architectures, algorithms, and performance evaluation methodologies. Topics include canonical distributed concepts such as remote procedure call, distributed objects, replication, distributed system security, concensus protocol, and recent distributed system technologies such as peer-to-peer, grid, autonomic computing, distributed massive data processing/Google map-reduce, system machine learning, distributed system debugging, multi-core systems, distributed virtualization. On completing this course, the student should be able to the following:
- Identify research problems and challenges in distributed systems, (assessed by review and presentation);
- List the state-of-art tools and techniques for addressing research problems and challenges in distributed systems (assessed by review and presentation);
- Develop and implement new ideas to solve open problems in distributed systems (assessed by project);
- Conduct technical reviews, technical writing, and technical presentations (assessed by review, project, paper, presentation).
Text Books
There are no assigned textbooks for this course. Topics will be covered during in-class lectures, and through course notes made available on this web page.
Links to the supplementary material in the form of research papers related to each topic are included in this syllabus. PDF for most papers is available through the NCSU library web site, which has full-text access to most recent ACM and IEEE journals and conferences. A number of supplemental distributed system textbooks are also available:
Distributed Systems: Concepts and Design, (4th Edition), G. Coulouris, J. Dollimore, and T. Kindberg
Distributed Systems (2nd Edition), Sape Mullender
Distributed Systems: Principles and Paradigms, Andrew S. Tanenbaum, Maarten van Steen
Course Description
Distributed systems have become the fundamental computing infrastructure for many important real-world applications such as Internet search engine, media streaming servers, online file sharing, information analytics, and scientific exploration. This course explores design and implementation principles in modern distributed systems. In particular, the course will emphasize on recent techniques used by real-world distributed systems such as peer-to-peer file sharing (e.g., BitTorrent), enterprise data center, and Internet search engine (Google). Students will learn the state of the art in distributed system architectures, algorithms, and performance evaluation methodologies. Topics include i) traditional distriubted computing concepts (e.g., distriubted objects, middleware, replication, distributed system security, and concensus protocol); and ii) recent emergent distributed system techniques such as peer-to-peer systems, massive data processing, Grid, and autonomic computing. Students will have opportunities to not only learn the common design methodology of many important distributed systems, but also gain hands-on experience through project implementations. The majority of course materials will be drawn from classic papers and current state-of-the-art work. The instructor will lecture for the first half of the semester and students will present papers and projects in the second half of the semester. Students will read and review papers ahead of time, participate in class discussions, present at least one research topic during the course, and do a term project individually or in a two-member team. Students will also write a paper (as well as review other students’ papers) describing their project and present their work at the end of the course, in a “conference” format designed to give students an experience similar to that of participating in a professional conference.
Prerequisites
CSC501 or equivalents. Programming in C++ or Java in Unix environment. If you are not sure whether you can attend this course, please consult the instructor.
Tentative Grading Policy
Written reviews 20%, class participation 20% (presentation: 10%, discussion: 10%), project 60% (proposal writeup 5%, proposal presentation 5%, Project MidReview Presentation 5%, demo 15%, final presentation 10%, Final write-up 20%)
Late policy
Calculated by the time recorded in the assignment emails received to the instructor. Students will lose 25% for each 24-hour period they are late on reviews, project, or paper.
Paper Review
Review guidelines: Provide a paragraph of summary about the paper, a paragraph of 2-3 strong points of the paper (i.e., Why the paper should be accepted), a paragraph of 2-3 weak points of the paper (i.e., why the paper should be rejected), brainstorming ideas for developing new research ideas related to the work described in the paper(optional).
- How to read an engineering research paper by Bill Griswold
- Suggested guidelines for finding “related work” for conference papers by Gail Kaiser
- Reviewing a technical paper by Mike Ernst
Project
- Suggested Term Project Topics (NCSU unity ID required).
- Course project development environment: Amazon AWS, Google Cloud, VCL
Both project proposal and final report should follow typical paper requirements using ACM Double-Column Paper format. The project proposal should include abstract, introduction, proposed approaches, and related work. The final project report should include a full paper content including abstract, introduction, design and algorithms, experiment evaluation, related work, and conclusion. We will organize a mini-conference for the students to present their project work. Three best papers will be selected during the mini-conference.
Class Schedule (Tentative)
| W | Date | Topic | Assigned Readings | Assignments |
| 1 | 1/13 |
Introduction |
|
Investigate your term project idea and do preparation for it. A list of candidate project topics will also be provided to you on the class. Talk to the instructor about your project idea and talk to other students in forming a two-three members group. Email the instructor to setup the appointment.
1/18 midnight: review due for
|
| 1/15 |
Replication |
|
||
| 2 | 1/20 |
Project Testbed |
Investigate your term project idea and do preparation for it. Talk to the instructor about your project idea and talk to other students in forming a group if you would like to work in a group.
1/25 midnight: review due for
|
|
| 1/22 |
Project Testbed |
|||
| 3 | 1/27 |
Project Testbed |
|
2/1 midnight: review due for
Sunday midnight: Paper presentation signup due. Please send an email to the TA to bid three papers in the list below and list your choices in decreasing order. You will be allocated with one paper to present based on the FCFS policy and paper availability. |
| 1/29 | Consensus Protocol |
|
||
| 4 | 2/3 | Consensus Protocol |
|
2/8 midnight: reviews due
|
| 2/5 | Consensus Protocol |
|
||
| 5 | 2/10 | Autonomic Computing |
|
2/15 midnight: project proposal due. |
| 2/12 | Peer-to-Peer Systems |
|
||
| 6 | 2/17 | Wellness Day |
|
2/22 midnight: reviews due
|
| 2/19 | Overlay Networks |
|
||
| 7 | 2/24 | Big Data |
|
3/1 midnight: reviews due
|
| 2/26 | Project Proposal Presentation | |||
| 8 | 3/3 | System Research Methodology |
|
No paper reading assigned. You should spend time on your term projects. |
| 3/5 | Student presentation | |||
| 9 | 3/10 | Project MidReview | No paper reading assigned. You should spend time on your term projects. | |
| 3/12 | Project MidReview | |||
| 10 | 3/17 | Spring break |
|
No paper reading assigned. You should spend time on your term projects. |
| 3/19 | Spring Break |
|
||
| 11 | 3/24 | Project MidReview | No paper reading assigned. You should spend time on your term projects. | |
| 3/26 | Student presentation | |||
| 12 | 3/31 | Student presentation | No paper reading assigned. You should spend time on your term projects. | |
| 4/2 | Student presentation | |||
| 13 | 4/7 | Student presentation | No paper reading assigned. You should spend time on your term projects. | |
| 4/9 | Student presentation | |||
| 14 | 4/14 | Student presentation | No paper reading assigned. You should spend time on your term projects. | |
| 4/16 | Student presentation | |||
| 15 | 4/21 | Student presentation | ||
| 4/23 | Project Demo | |||
| 16 | 4/28 | Project Demo |
Suggested Topics for Student Presentations
(You can suggest to the instructor the papers that are not in this list but you would like to present):
AI-Driven Distributed System Management
- Chenyuan Yang, et al., KNighter: Transforming Static Analysis with LLM-Synthesized Checkers, Proc. of SOSP 2025.–>Aum Yagneshkumar Pandya
- Olufogorehan Tunde-Onadele, Feiran Qin, Xiaohui Gu, Yuhang Lin, “ClearCausal: Cross Layer Causal Analysis for Automatic Microservice Performance Debugging“, 5th IEEE International Conference on Autonomic Computing and Self-Organizing Systems, 2025.
- Jingzhu He, Yuhang Lin, Xiaohui Gu, Chin-Chia Michael Yeh, and Zhongfang Zhuang, “PerfSig: Extracting Performance Bug Signatures via Multi-modality Causal Analysis“, Proc. of the 44th International Conference on Software Engineering (ICSE), Pittsburgh, PA, May, 2022, pp. 1669-1680.
- Jingzhu He, Ting Dai, Xiaohui Gu, and Guoliang Jin, “HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems“, Proc. of ACM Symposium on Cloud Computing (SOCC), Renton, WA, October, 2020, pp. 344-357.–>Om Kumar Singh
- Ting Dai, Jingzhu He, Xiaohui Gu, Shan Lu, and Peipei Wang, “DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems“, Proc. of ACM Symposium on Cloud Computing (SOCC), Carlsbad, CA, October, 2018.
- Jingzhu He, Ting Dai, and Xiaohui Gu, “TScope: Automatic Timeout Bug Identification for Server Systems“, Proc. of IEEE International Conference on Autonomic Computing (ICAC), Trento, Italy, September, 2018.
- Daniel Dean, Hiep Nguyen, Xiaohui Gu, Hui Zhang, Junghwan Rhee, Nipun Arora, Geoff Jiang, PerfScope: Practical Online Server Performance Bug Inference in Production Cloud Computing Infrastructures“, Proc. of SOCC 2014.–>Ashwattha Phatak
- Hiep Nguyen, Zhiming Shen, Yongmin Tan, Xiaohui Gu, “FChain: Toward Black-box Online Fault Localization for Cloud Systems“, Proc. of ICDCS 2013.–>Manav Shah
- Daniel Dean, Hiep Nguyen, Xiaohui Gu, “UBL: Unsupervised Behavior Learning for Predicting Performance Anomalies in Virtualized Cloud Systems“, Proc. of ACM International Conference on Autonomic Computing (ICAC), San Jose, CA, September, 2012.
AI Infrastrucutre & Cloud Computing
- Siddhant Ray et al., METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation, Proc. of SOSP 2025.
- Jinkun Lin et al, Understanding Stragglers in Large Model Training Using What-if Analysis, Proc. of OSDI 2025.–> Ayush Gala
- Y. Sheng et al., Fairness in Serving Large Language Models, Proc. of OSDI 2024—Darsh Rank
- Philipp Moritz et al., Ray: A Distributed Framework for Emerging AI Applications, Proc. of OSDI 2018—Aryan Inguva
- Martín Abadi et al., TensorFlow: A System for Large-Scale Machine Learning, Proc. of OSDI 2016.–>Smeet Nagda
- Tom Kuchler et al., Unlocking True Elasticity for the Cloud-Native Era with Dandelion, Proc. of SOSP 2025.–>Narasimhareddy Dilip Kumar Irala
- Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, John Wilkes, “AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service“, Proc. of USENIX International Conference on Autonomic Computing (ICAC), San Jose, CA, June, 2013.–> Youbin Kim
- Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes, CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems, Proc. of ACM SOCC 2011.–>Sandhiya Shunmugavel
- Xiaohui Gu, Klara Nahrstedt, A Scalable QoS-Aware Service Aggregation Model for Peer-to-Peer Computing Grids, Proc. of IEEE International Symposium on High Performance Distributed Computing (HPDC 2002)
Distributed Systems Security
- Zihao Zhang, Ti Zhou, Christa Jenkins, Omar Chowdhury, Shuai Mu, AutoMan: Facilitating Verified Distributed Systems Development Through Automatic Code Generation and Manual Optimizations, Proc. of SOSP 2025.
- Olufogorehan Tunde-Onadele, Yuhang Lin, Xiaohui Gu, and Jingzhu He, “Understanding Software Security Vulnerabilities in Cloud Server Systems“, Proc. of the 10th IEEE International Conference on Cloud Engineering (IC2E), Pacific Grove, CA, September, 2022–>Niharika Maruvanahalli Suresh
- Yuhang Lin, Olufogorehan Tunde-Onadele, Xiaohui Gu, Jingzhu He, and Hugo Latapie, SHIL: Self-Supervised Hybrid Learning for Security Attack Detection in Containerized Applications“, Proc. of the 3rd IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), Los Angeles, CA, September, 2022–>Bhavishya Tarun
- Yuhang Lin, Olufogorehan Tunde-Onadele, and Xiaohui Gu, “CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications“, Proc. of Annual Computer Security Applications Conference (ACSAC), Austin, TX, December, 2020.–>Yash Mor
- Olufogorehan Tunde-Onadele, Yuhang Lin, Jingzhu He, and Xiaohui Gu, “Self-Patch: Beyond Patch Tuesday for Containerized Applications“, Proc. of IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), Washington, DC, August, 2020, pp. 21-27.–>Sai Tarun Yellamraju
- Rui Shu et al., A Study of Security Vulnerabilities on Docker Hub, Proc. of CODASPY 2017–>Bestin Lalu
Academic Integrity
The university provides a detailed policy on academic integrity. This policy can be found in the Code of Student Conduct. It is understood that when you submit your homework, you are implicitly agreeing to the university honor pledge: “I have neither given nor received unauthorized aid on this test or assignment.”
Academic dishonesty (e.g., cheating or plagiarism) will not be tolerated under any circumstances. If you are having difficultly with any part of the course material, please see me as soon as possible. I will do everything I can to help you with any course-related problems you may be having. If you are found to be guilty of academic dishonesty, however, I will then do everything I can to see that you are punished as forcefully as possible. This may include asking to have you suspended or expelled from the course, the program, and/or the university. At a minimum, you will receive -50% for the assignment in question, and your name will be placed on record with the university as having committed an academic offence. Multiple offences during your academic career will result in suspension or expulsion from the university. I take absolutely no pleasure in pursuing cases of academic misconduct, and would ask that you please do not put me in this position.
Students With Disabilities
All effort will be made to ensure that no students with disabilities are denied any opportunity to successfully complete this course. If you have specific requirements that need to be addressed, please contact me immediately. Possible changes can include (but are not necessarily limited to) rescheduling classes from inaccessible to accessible buildings, or providing access to auxiliary aids such as tape recorders, special lab equipment, or other services such as readers, note takers, or interpreters. This may also include oral or taped tests, readers, scribes, separate testing rooms, or extension of time limits.
Lab Safety Issues
None.
Pass-Through Costs
None.