callmespring.github.io/publication.md at master · callmespring/callmespring.github.io

layout	default
title	Publication

Please feel free to email me c.shi7@lse.ac.uk if you have any comments.

Some Preprints

* indicates equal contribution

Liu, P., Shi, C and Sun, W. Reinforcement Learning from Human Feedback: A Statistical Perspective.

Zhou, H*., Ye, K*., Xu, E., Zhu, J., Yang, Y., Gong, S. and Shi, C. Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic Demystifying-GRPO

Zhou, H*., Zhu, J*., Yang, Y. and Shi, C. Detecting LLM-Generated Text with Performance Guarantees.

    <circle cx="11" cy="28" r="4.5" fill="#FFD21E"/>
    <circle cx="29" cy="28" r="4.5" fill="#FFD21E"/>
    
    <circle cx="15.5" cy="17.5" r="1.5" fill="black"/>
    <circle cx="24.5" cy="17.5" r="1.5" fill="black"/>
    
    <path d="M15 22.5C15 22.5 16.5 24.5 20 24.5C23.5 24.5 25 22.5 25 22.5" stroke="black" stroke-width="1.5" stroke-linecap="round"/>
  </svg>
  <span class="hf-module-badge">StatDetectLLM</span>
</a>

Liu, Z., Guo, X., Yang, Z., Lou, F., Zeng, L., Li, M., Qi, Q., Liu, Z., Han, Y., Cheng, D., Feng, X., Wang, H., Shi, C. and Zhang, L. Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Zhang, J., Wang, J., Shi, C., Piette, J., Zeng, D., Wu, Z. PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing

Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing.

Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning VRPO

Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.

Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network

Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey

Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from Human Feedback

Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments ARMAdesign
slides

Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States . state-abstraction

Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. DIRL
slides presented at CMStatistics 2022.

Publications/accepted manuscripts

AOAS

Yang, Y., Shi, C, Yao, F., Wang, S. and Zhu, H. (2026+). Spatially Randomized Designs Can Enhance Policy Evaluation

CVPR

Qi, X*., Ye, K*., Shi, C, Yang, Y., Zhou, H. and Zhu, J. (2026). A Difference-in-Difference Approach to Detecting AI-Generated Images

ICLR

Zhou, H*, Zhu, J*., Ye., K., Yang, Y., Xu, E. and Shi, C. (2026). Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text L2D

ICLR

Wu, X*., Wen, Q*., Zhang, Y., Zhu, H., Li, T. and Shi, C. (2026). Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning

Brain

Lawrence, D., Avraham, G.,Yao J., Li, L., Shi, C., Starr, P.A. and Little, S. (2025) Cortico-basal oscillations index naturalistic movements during deep brain stimulation.

JASA

Ma, T*., Zhu, J*., Cai, H., Qi, Z., Chen, Y., Shi, C. and Laber, E. (2025+) Sequential Knockoffs for Variable Selection in Reinforcement Learning SEEK

STAT

Hu, L., Wang, J., Wu, Z. and Shi, C. (2025) Generalized Fitted Q-Iteration with Clustered Data.

JASA

Wang, W. and Shi, C. (2025+) From Authors to Reviewers: Leveraging Rankings to Improve Peer Review
Discussion of "Analysis of the ICML 2023 Ranking Data: Can Authors’ Opinions of Their Own Papers Assist Peer Review in Machine Learning?"

JASA

Wang, J., Qi, Z. and Shi, C. (2025+) Blessing from Human-AI Interaction: Super Policy Learning in Confounded Environments

NeurIPS

Xu, E*., Ye, K*., Zhou, H*., Zhu, L., Quinzan, F. and Shi, C. (2025). Doubly Robust Alignment for Large Language Models DRPO4LLM
slides video presented at Tsinghua Statistics + AI Frontier Summit

NeurIPS

Zhou, H*., Zhu, J*., Su, P., Ye, K., Yang, Y., Gavioli-Akilagun SA. and Shi, C. (2025). AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees AdaDetectGPT
video presented at 狗熊会

NeurIPS

Wu, X*., Li, T*., Aminian, G., Behnamnia, A., Rabiee, H. and Shi, C. (2025). Pessimistic Data Integration for Policy Evaluation.

NeurIPS

Feng, J., Shi, C., Wu, Z., Yan, X. and Zhao, W. (2025). Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis.

TMLR

Xu, Y., Shi, C., Luo, S., Wang, L. and Song, R. (2025). Doubly Robust Uncertainty Quantification for Quantile Treatment Effects in Sequential Decision Making. 2023 JSM Student Paper Award

ICML

Zhu, J*., Li, J*., Zhou, H., Lin, Y., Lin, Z., Shi, C. (2025). Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach CausalGraphCut

ICML

Wen, Q*., Shi, C*., Yang, Y., Tang, N., Zhu, H. (2025). Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments SwitchMDP

ICML

Zhou, H., Hanna, J., Zhu, J., Yang, Y., Shi, C. (2025). Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation

ICML

Behnamnia, A., Aminian, G., Aghaei, A., Shi, C., Tan, V.Y., Rabiee, H. (2025). Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning (spotlight, top 2.6% of submissions).

HDSR

StatsUpAI Interest Group (2025). Statistics and AI: A Fireside Conversation.

Stat Sci

Uehara, M., Shi, C. and Kallus, N. (2025+). A Review of Off-Policy Evaluation in Reinforcement Learning.

AOS

Li, M., Shi, C., Wu, Z. and Fryzlewicz, P. (2025). Testing Stationarity and Change Point Detection in Reinforcement Learning CUSUM-RL
slides video presented at JSM 2022.

AOS

Luo, L*., Shi, C*., Wang, J*, Wu, Z. and Li, L. (2025). Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework MedtimeRL

JASA

Bian, Z., Shi, C., Qi, Z. and Wang, L. (2025). Off-policy Evaluation in Doubly Inhomogeneous Environments 2FEOPE

NeurIPS

Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning Two-way-deconfounder

ICML

Li, T*., Shi, C*., Wen, Q., Sui, Y., Qin, Y., Lai, C. and Zhu, H. (2024). Combining Experimental and Historical Data for Policy Evaluation Data_Combination

J Math Psychol

Li, J., Shi, C., Li, L. and Collins, A. (2024). Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making dynamic_noise_estimation

JASA

Shi, C*., Qi, Z*., Wang, J. and Zhou, F. (2024). Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization VEPO

JASA

Shi, C., Zhou, Y. and Li, L. (2024). Testing Directed Acyclic Graph via Structural, Supervised and Generative Adversarial Learning SUGAR
slides presented at JSM 2021

JASA

Li, T*., Shi, C*., Lu, Z., Li, Y. and Zhu, H. (2024). Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing CQSTVCM

JASA

Shi, C., Zhu, J., Shen, Y., Luo, S., Zhu, H. and Song, R. (2024). Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process COPE

JASA

Shi, C., Luo, S., Le, Y., Zhu, H. and Song, R. (2024). Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons SEAL

JRSS-B

Luo, S*., Yang, Y*., Shi, C*., Yao, F., Ye, J. and Zhu, H. (2024). Policy Evaluation for Temporal and/or Spatial Dependent Experiments STVCM

AISTATS

Zhu, J*., Wan, R*., Qi, Z., Luo, S. and Shi, C. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards ROOM

NeurIPS

Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N., Shi, C. and Sun, W. (2023) Future-Dependent Value-Based Off-Policy Evaluation in POMDPs (spotlight) future-dependent-ope

NeurIPS

Li, T*., Shi, C*., Wang, J., Zhou, F. and Zhu, H. (2023). Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making MDPdesign

JRSS-B

Zhou, Y., Shi, C., Li, L. and Yao, Q. (2023). Testing for the Markov Property in Time Series via Deep Conditional Generative Learning markov_test

AOAS

Shi, C., Wan, R., Song, G., Luo, S., Zhu, H. and Song, R. (2023). A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets CausalMARL

JASA

Shi, C*., Wang, X*., Luo, S., Zhu, H., Ye, J. and Song, R. (2023). Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework CausalRL
slides video presented at Online Causal Inference Seminar

KDD

Wu, G., Song, G., Lv, X., Luo, S., Shi, C. and Zhu, H. (2023). DNet: Distributional Network for Distributional Individualized Treatment Effects.

ICML

Ge, L., Wang, J., Shi, C., Wu, Z. and Song, R. (2023). A Reinforcement Learning Framework for Dynamic Mediation Analysis MediationRL
2023 ICSA Student Paper Award

ICML

Xu, Y., Zhu, J., Shi, C., Luo, S. and Song, R. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation IVMDP

ICML

Wang, J., Shi, C. and Wu, Z. (2023). A Robust Test for the Stationarity Assumption in Sequential Decision Making. Double-CUSUM-RL

CogSci

Li, J., Shi, C., Li, L. and Collins, A. (2023). Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making.

HDSR

Shi, C. (2023). The Impact of David Cox’s Work and Leadership on My Research.

STAT

Gao, Y., Shi, C. and Song, R. (2023). Deep Spectral Q-learning with Application to Mobile Health. 2022 JSM Student Paper Award

JMLR

Cai, H*., Shi, C*., Song, R. and Lu, W. (2023). Jump Interval-Learning for Individualized Decision Making with Continuous Treatments.

<a href="https://cran.r-project.org/web/packages/JQL/index.html" style="text-decoration: none; display: inline-flex; align-items: center; margin-left: 8px; vertical-align: middle;">
  <img src="https://www.r-project.org/logo/Rlogo.svg" width="22" height="17" style="vertical-align: middle; border: none; display: inline-block; margin-right: 4px;">
  <span style="color: #000000 !important; font-weight: 600; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif; font-size: 16px;">JQL</span>
</a>

AISTATS

Zhou, Y., Qi, Z., Shi, C. and Li, L. (2023). Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach. PBL

AISTATS

Zhang, Y., Shi, C. and Luo, S. (2023). Conformal Off-Policy Prediction. R code COPP

JASA

Shi, C. and Li, L. (2022). Testing Mediation Effects Using Logic of Boolean Matrices. LOGAN
slides presented at JSM 2021.

JRSS-B

Shi, C., Zhang, S., Song, R. and Lu, W. (2022). Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings. SAVE
slides presented at ICSA 2019.

ICML

Shi, C*., Uehara, M*., Huang, J. and Jiang, N. (2022). A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes. (long talk, top 2%). Confounded-POMDP-OPE
video presented at ICML.

STAT

Li, L., Shi, C., Guo, T. and Jagust, W. (2022). Sequential Pathway Inference for Multimodal Neuroimaging Analysis. LOGAN
slides presented at JSM 2021.

JMLR

Shi, C., Xu, T., Bergsma, W. and Li, L. (2021). Double Generative Adversarial Networks for Conditional Independence Testing. dgcit

JMLR

Shi, C., Luo, S., Zhu, H. and Song, R. (2021). An Online Sequential Test for Qualitative Treatment Effects.

NeurIPS

Cai, H*. Shi, C*., Song, R. and Lu, W. (2021). Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings. 2021 ENAR Distinguished Student Paper Awards
DJL
video presented at NeurIPS.

IJCAI workshop

Wan, R*., Zhang, S*., Shi, C., Luo, S. and Song, R. (2021). Pattern Transfer Learning for Reinforcement Learning in Order Dispatching (best paper).
video presented at the workshop.

ICML

Shi, C*., Wan, R*., Chernozhukov, V. and Song, R. (2021). Deeply-Debiased Off-Policy Interval Estimation (long talk, top 3%). D2OPE
video presented at ICML.

JASA

Shi, C., Song, R., Lu, W. and Li. R. (2021). Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation (ROSE). R code for linear/logistic regression

AOS

Shi, C., Song, R. and Lu, W. (2021). Concordance and Value Information Criteria for Optimal Treatment Decision (CIVIC)

JMLR

Shi, C., Lu, W. and Song, R. (2020). Breaking the Curse of Nonregularity with Subagging --- Inference of the Mean Outcome under Optimal Treatment Regimes. R and C sample code subagging2.cpp sb.r

ICML

Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020). Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making. TestMDP
slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.

JASA

Shi, C., Lu, W. and Song, R. (2020). A Sparse Random Projection-based Test for Overall Qualitative Treatment Effects

AOS

Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear Hypothesis Testing for High Dimensional Generalized Linear Models. 2018 IMS travel award
R code for linear/ logistic/ Poisson regression

AOS

Shi, C., Lu, W., and Song, R. (2019). On Testing Conditional Qualitative Treatment Effects. 2017 IMS travel award
slides presented at JSM 2017

JMLR

Shi, C., Lu, W. and Song, R. (2019). Determining the Number of Latent Factors in Multirelational Learning.

JASA

Shi, C., Lu, W., and Song, R. (2018). A Massive Data Framework for M-estimators with Cubic-Rate.

JRSS-B

Shi, C., Song, R., Lu, W., and Fu, B. (2018). Maximin Projection Learning for Optimal Treatment Decision with Heterogeneous Individualized Treatment Effects.