Skip to content

Use touchstone#352

Draft
VisruthSK wants to merge 3 commits intomasterfrom
use-touchstone
Draft

Use touchstone#352
VisruthSK wants to merge 3 commits intomasterfrom
use-touchstone

Conversation

@VisruthSK
Copy link
Copy Markdown
Member

@VisruthSK VisruthSK commented Apr 8, 2026

Use touchstone to evaluate PRs' impact on performance. Starting with a monolithic approach to testing, calling just loo().

Closes #348.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.80%. Comparing base (60f012d) to head (c73be4d).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #352   +/-   ##
=======================================
  Coverage   92.80%   92.80%           
=======================================
  Files          31       31           
  Lines        3004     3004           
=======================================
  Hits         2788     2788           
  Misses        216      216           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VisruthSK
Copy link
Copy Markdown
Member Author

VisruthSK commented Apr 9, 2026

Should lower reps to maybe 5, perhapsalso use smaller data? Current ones are fully LLM generated so maybe more careful creation could cut down on execution time while still being informative tests.

@jgabry any thoughts? The main file is touchstone/script.R The code rn is ugly, I'll fix it tomorrow probably, but any thoughts on the actual tests? The run is taking 1hr, is that way too slow?

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 9, 2026

If we switch to the example that @avehtari suggested, is that a bigger log lik matrix/array than what you have here or smaller?

@VisruthSK
Copy link
Copy Markdown
Member Author

VisruthSK commented Apr 10, 2026

Wine one is a decent bit larger. How long do you figure is too long?

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 10, 2026

Is there an easy way to tell it to skip the touchdown step? For important PRs it’s fine for it to be slow, even very slow, but for simple PRs that we know won’t affect runtime at all (or trivially) it would be good to be able to skip it. I guess we could always just merge the PR without waiting for it to finish, but sometimes we’re not checking the CI results immediately and then we’d end up just wasting a bunch of computation which I think might prevent other things in the Stan org from running.

@avehtari
Copy link
Copy Markdown
Member

Storing that log_lik_matrix and using the stored matrix for loo takes a couple seconds. I don't think that's too slow.

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 10, 2026

I agree, that’s not too slow. Maybe I’m just confused. I thought that example was bigger than the one Visruth is currently using, and the current run takes 1hr. It runs things many times to compute the benchmarks, but running something that takes a few seconds many times should still take way less than an hour. So then why does the current example take 1hr? Sorry if I’m missing something simple here.

@VisruthSK
Copy link
Copy Markdown
Member Author

I think the LLM code is doing the wrong thing--I'll run the wine thing and store the log lik (probably in the touchstone dir?) and just read from it and run loo. I think the LLMd code is just abysmal and so is too slow. Will try to get this in by Monday.

@jgabry
Copy link
Copy Markdown
Member

jgabry commented Apr 10, 2026

I think the LLM code is doing the wrong thing

It might be, I haven't had time to dig into it. Either way, thanks for setting this up. I took a quick look, and I do like that it's also testing the loo.function method in addition to the matrix and array methods. But yeah, let's use @avehtari's example.

@avehtari
Copy link
Copy Markdown
Member

I tested the current benchmark code by running it manually on my laptop, and it takes less than 10s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance benchmarks for PRs?

4 participants