| layout | default |
|---|---|
| title | Publication |
Please feel free to email me c.shi7@lse.ac.uk if you have any comments.
* indicates equal contribution
Liu, P., Shi, C and Sun, W. Reinforcement Learning from Human Feedback: A Statistical Perspective.
Zhou, H*., Ye, K*., Xu, E., Zhu, J., Yang, Y., Gong, S. and Shi, C. Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic Demystifying-GRPO
<circle cx="11" cy="28" r="4.5" fill="#FFD21E"/>
<circle cx="29" cy="28" r="4.5" fill="#FFD21E"/>
<circle cx="15.5" cy="17.5" r="1.5" fill="black"/>
<circle cx="24.5" cy="17.5" r="1.5" fill="black"/>
<path d="M15 22.5C15 22.5 16.5 24.5 20 24.5C23.5 24.5 25 22.5 25 22.5" stroke="black" stroke-width="1.5" stroke-linecap="round"/>
</svg>
<span class="hf-module-badge">StatDetectLLM</span>
</a>
Liu, Z., Guo, X., Yang, Z., Lou, F., Zeng, L., Li, M., Qi, Q., Liu, Z., Han, Y., Cheng, D., Feng, X., Wang, H., Shi, C. and Zhang, L. Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
Zhang, J., Wang, J., Shi, C., Piette, J., Zeng, D., Wu, Z. PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing.
Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning VRPO
Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.
Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network
Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey
Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from Human Feedback
Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments ARMAdesign
slides
Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States . state-abstraction
Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences
Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data
Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. DIRL
slides presented at CMStatistics 2022.
Discussion of "Analysis of the ICML 2023 Ranking Data: Can Authors’ Opinions of Their Own Papers Assist Peer Review in Machine Learning?"
slides video presented at Tsinghua Statistics + AI Frontier Summit
video presented at 狗熊会
slides video presented at JSM 2022.
slides presented at JSM 2021
slides video presented at Online Causal Inference Seminar
2023 ICSA Student Paper Award
<a href="https://cran.r-project.org/web/packages/JQL/index.html" style="text-decoration: none; display: inline-flex; align-items: center; margin-left: 8px; vertical-align: middle;">
<img src="https://www.r-project.org/logo/Rlogo.svg" width="22" height="17" style="vertical-align: middle; border: none; display: inline-block; margin-right: 4px;">
<span style="color: #000000 !important; font-weight: 600; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif; font-size: 16px;">JQL</span>
</a>
slides presented at JSM 2021.
slides presented at ICSA 2019.
video presented at ICML.
slides presented at JSM 2021.
DJL
video presented at NeurIPS.
video presented at the workshop.
video presented at ICML.
slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.
R code for linear/ logistic/ Poisson regression
slides presented at JSM 2017
slides presented at JSM 2016, poster presented at 2018 NCSU research symposium.
slides presented at ENAR 2016