@@ -117,57 +117,56 @@ <h2>What We Build</h2>
117117 < h2 class ="section-intro__title "> Recent Projects</ h2 >
118118 < p class ="section-intro__subtitle "> Representative systems and tooling from recent lab work.</ p >
119119 </ div >
120- < div class ="project-showcase__grid mt-4 ">
121- < article class ="project-card project-card--feature ">
122- < div class ="project-card__media ">
123- < img class ="project-logo-fit " src ="{{ '/assets/img/project/traincheck_logo.png' | relative_url }} " alt ="TrainCheck " />
124- </ div >
125- < div class ="project-card__content ">
126- < h3 > Catching Silent Errors in Deep Learning Training</ h3 >
127- < p >
128- TrainCheck learns semantic invariants from healthy runs and enforces checks at runtime,
129- catching silent training errors before they consume GPU time and degrade model quality.
130- </ p >
131- < p class ="project-badge "> OSDI 2025</ p >
132- < a class ="button button-light " href ="{{ '/paper/traincheck-osdi25-preprint.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
133- </ div >
134- </ article >
120+ < div class ="project-carousel-shell mt-4 ">
121+ < div class ="owl-theme owl-carousel active_course project-carousel ">
122+ < article class ="project-card ">
123+ < div class ="project-card__media ">
124+ < img class ="project-logo-fit " src ="{{ '/assets/img/project/traincheck_logo.png' | relative_url }} " alt ="TrainCheck " />
125+ </ div >
126+ < div class ="project-card__content ">
127+ < h3 > Catching Silent Errors in Deep Learning Training</ h3 >
128+ < p > TrainCheck learns semantic invariants from healthy runs and enforces checks at runtime to catch silent training errors early.</ p >
129+ < p class ="project-badge "> OSDI 2025</ p >
130+ < a class ="button button-light " href ="{{ '/paper/traincheck-osdi25-preprint.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
131+ </ div >
132+ </ article >
135133
136- < article class ="project-card ">
137- < div class ="project-card__media ">
138- < img src ="{{ '/assets/img/project/trainverify.png' | relative_url }} " alt ="TrainVerify figure " />
139- </ div >
140- < div class ="project-card__content ">
141- < h3 > TrainVerify: Equivalence-Based Verification for Distributed LLM Training</ h3 >
142- < p > TrainVerify verifies semantic equivalence across distributed LLM training executions to catch subtle correctness issues.</ p >
143- < p class ="project-badge "> SOSP 2025</ p >
144- < a class ="button button-light " href ="{{ '/paper/trainverify-sosp25.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
145- </ div >
146- </ article >
134+ < article class ="project-card ">
135+ < div class ="project-card__media ">
136+ < img src ="{{ '/assets/img/project/trainverify.png' | relative_url }} " alt ="TrainVerify figure " />
137+ </ div >
138+ < div class ="project-card__content ">
139+ < h3 > TrainVerify: Equivalence-Based Verification for Distributed LLM Training</ h3 >
140+ < p > TrainVerify verifies semantic equivalence across distributed LLM training executions to catch subtle correctness issues.</ p >
141+ < p class ="project-badge "> SOSP 2025</ p >
142+ < a class ="button button-light " href ="{{ '/paper/trainverify-sosp25.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
143+ </ div >
144+ </ article >
147145
148- < article class ="project-card ">
149- < div class ="project-card__media ">
150- < img src ="{{ '/assets/img/project/phoenix.png' | relative_url }} " alt ="Phoenix figure " />
151- </ div >
152- < div class ="project-card__content ">
153- < h3 > Phoenix: Optimistic Recovery via Partial Process State Preservation</ h3 >
154- < p > Phoenix improves software availability by preserving partial process state and enabling low-overhead optimistic recovery.</ p >
155- < p class ="project-badge "> SOSP 2025</ p >
156- < a class ="button button-light " href ="{{ '/paper/phoenix-sosp25.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
157- </ div >
158- </ article >
146+ < article class ="project-card ">
147+ < div class ="project-card__media ">
148+ < img src ="{{ '/assets/img/project/phoenix.png' | relative_url }} " alt ="Phoenix figure " />
149+ </ div >
150+ < div class ="project-card__content ">
151+ < h3 > Phoenix: Optimistic Recovery via Partial Process State Preservation</ h3 >
152+ < p > Phoenix improves availability by preserving partial process state and enabling low-overhead optimistic recovery.</ p >
153+ < p class ="project-badge "> SOSP 2025</ p >
154+ < a class ="button button-light " href ="{{ '/paper/phoenix-sosp25.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
155+ </ div >
156+ </ article >
159157
160- < article class ="project-card ">
161- < div class ="project-card__media ">
162- < img class ="project-xinda-fit " src ="{{ '/assets/img/project/xinda.png' | relative_url }} " alt ="Xinda figure " />
163- </ div >
164- < div class ="project-card__content ">
165- < h3 > Enhancing Slow-Fault Tolerance in Distributed Systems</ h3 >
166- < p > Xinda diagnoses and mitigates slow faults with adaptive mechanisms tailored to modern distributed system behavior.</ p >
167- < p class ="project-badge "> NSDI 2025</ p >
168- < a class ="button button-light " href ="{{ '/paper/xinda-nsdi25-preprint.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
169- </ div >
170- </ article >
158+ < article class ="project-card ">
159+ < div class ="project-card__media ">
160+ < img class ="project-xinda-fit " src ="{{ '/assets/img/project/xinda.png' | relative_url }} " alt ="Xinda figure " />
161+ </ div >
162+ < div class ="project-card__content ">
163+ < h3 > Enhancing Slow-Fault Tolerance in Distributed Systems</ h3 >
164+ < p > Xinda diagnoses and mitigates slow faults with adaptive mechanisms tailored to modern distributed system behavior.</ p >
165+ < p class ="project-badge "> NSDI 2025</ p >
166+ < a class ="button button-light " href ="{{ '/paper/xinda-nsdi25-preprint.pdf' | relative_url }} " target ="_blank "> Read Paper</ a >
167+ </ div >
168+ </ article >
169+ </ div >
171170 </ div >
172171 </ div >
173172</ section >
0 commit comments