Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #352 +/- ##
=======================================
Coverage 92.80% 92.80%
=======================================
Files 31 31
Lines 3004 3004
=======================================
Hits 2788 2788
Misses 216 216 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Should lower reps to maybe 5, perhapsalso use smaller data? Current ones are fully LLM generated so maybe more careful creation could cut down on execution time while still being informative tests. @jgabry any thoughts? The main file is touchstone/script.R The code rn is ugly, I'll fix it tomorrow probably, but any thoughts on the actual tests? The run is taking 1hr, is that way too slow? |
|
If we switch to the example that @avehtari suggested, is that a bigger log lik matrix/array than what you have here or smaller? |
|
Wine one is a decent bit larger. How long do you figure is too long? |
|
Is there an easy way to tell it to skip the touchdown step? For important PRs it’s fine for it to be slow, even very slow, but for simple PRs that we know won’t affect runtime at all (or trivially) it would be good to be able to skip it. I guess we could always just merge the PR without waiting for it to finish, but sometimes we’re not checking the CI results immediately and then we’d end up just wasting a bunch of computation which I think might prevent other things in the Stan org from running. |
|
Storing that log_lik_matrix and using the stored matrix for loo takes a couple seconds. I don't think that's too slow. |
|
I agree, that’s not too slow. Maybe I’m just confused. I thought that example was bigger than the one Visruth is currently using, and the current run takes 1hr. It runs things many times to compute the benchmarks, but running something that takes a few seconds many times should still take way less than an hour. So then why does the current example take 1hr? Sorry if I’m missing something simple here. |
|
I think the LLM code is doing the wrong thing--I'll run the wine thing and store the log lik (probably in the touchstone dir?) and just read from it and run loo. I think the LLMd code is just abysmal and so is too slow. Will try to get this in by Monday. |
It might be, I haven't had time to dig into it. Either way, thanks for setting this up. I took a quick look, and I do like that it's also testing the |
|
I tested the current benchmark code by running it manually on my laptop, and it takes less than 10s |
Use touchstone to evaluate PRs' impact on performance. Starting with a monolithic approach to testing, calling just
loo().Closes #348.