user: implement user factory #106
Conversation
lzrd
left a comment
There was a problem hiding this comment.
I'm currently testing against this PR and noticed one minor doc vs code issue.
jclulow
left a comment
There was a problem hiding this comment.
I've started taking a look at this, and have left some thoughts on what I've seen so far. I think it would help to have more of a complete picture of how this will get deployed and configured for hubris CI as well, when evaluating it all.
| * Install the agent binary with the control program name in a location in | ||
| * the default PATH so that job programs can find it. | ||
| */ | ||
| let cprog = format!("/usr/bin/{CONTROL_PROGRAM}"); |
There was a problem hiding this comment.
I don't want to move the control program location for environments where it has already existed in /usr/bin. That's part of what this abstraction is about:
Lines 42 to 98 in c7805b4
| /* | ||
| * Ubuntu 18.04 had a genuine pre-war separate /bin directory! | ||
| */ | ||
| let binmd = std::fs::symlink_metadata("/bin")?; | ||
| if binmd.is_dir() { | ||
| std::os::unix::fs::symlink( | ||
| format!("../usr/bin/{CONTROL_PROGRAM}"), | ||
| format!("/bin/{CONTROL_PROGRAM}"), | ||
| )?; | ||
| } |
| <!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'> | ||
| <service_bundle name='buildomat-worker' type='manifest'> | ||
| <service name='site/buildomat/factory-user-worker' type='service' version='0'> | ||
| <exec_method name='start' type='method' timeout_seconds='60' exec='{{exec}}' /> |
There was a problem hiding this comment.
We ought to use a method context here that constrains the process to the unprivileged build user for the instance, so that it doesn't start out running as root -- like
buildomat/factory/hubris/smf/hubris.xml
Lines 12 to 14 in c7805b4
In order for the chroot(2) to work we'll need to grant the process the proc_chroot privilege (not in the basic set). Then we'll need to drop that privilege as soon as the chroot() is done, using setppriv(2), prior to doing anything else so that when we then download and run the agent binary it can't chroot again.
We probably also want to remove proc_info (which is part of the basic set) so that you can't see other processes on the system that belong to other users/jobs in, say, ps(1) output. There might be other privileges that it makes sense to chuck out here, but that's the one that comes to mind immediately.
Because this factory intends to support multiple concurrent jobs on the machine, we should also look at putting each build user in a separate project(5), and then setting some resource_controls(7) on those projects to prevent one job from having too much of an impact on other jobs that are running concurrently. We might also want to look at the FSS(4) scheduler, which can some amount of scheduler fairness at the project rather than process level.
| } | ||
|
|
||
| fn root_dir(worker: WorkerId) -> PathBuf { | ||
| Path::new("/var/run/buildomat/worker-roots").join(worker.to_string()) |
There was a problem hiding this comment.
I think we ought to create a two tier structure here:
- top level,
/var/run/buildomat/worker/WORKER_IDwhich would be owned by the (unprivileged) user and group for the worker, and mode0700, so that it's only visible and traversable to the specific worker - another directory one level down, e.g.,
/var/run/buildomat/worker/WORKER_ID/root, which could then be ownedroot:rootand mode0755like the real root directory.
| let available_targets = c | ||
| .config | ||
| .slots | ||
| .iter() | ||
| .filter(|(name, _)| !used_slots.contains(name.as_str())) | ||
| .map(|(_, slot)| slot.target.clone()) | ||
| /* | ||
| * Deduplicate the targets by first collecting into a HashSet. | ||
| */ | ||
| .collect::<HashSet<_>>() | ||
| .into_iter() | ||
| .collect::<Vec<_>>(); |
There was a problem hiding this comment.
When determining which targets are available, I think we need to be able to specify some way to check the health of each configured slot. This is a piece that I had not yet completed for the hubris factory, but I think is relatively critical: we need to be able to check for the presence of the expected set of USB devices (debug probes, serial ports, etc) prior to taking a lease from the server. Otherwise, it seems likely that some of the time we'll have broken slots that absorb and then fail jobs, especially when we have more than one slot on a system with different probes.
There was a problem hiding this comment.
I was planning on deferring health checking in a future PR: it doesn't strictly block deploying an MVP of the Hubris hardware CI, and there are some alternate ideas I have on how to possibly implement this. Would it be ok to defer health checking on a future PR?
The alternate idea I had was to delegate the health checking to the job itself, and adding a bmat worker mark-broken -m "message" command that marks the worker as failed and putting it on hold. The hold would both alert operators (once I implement monitoring for held workers) and keep the slot reserved, preventing other jobs for starting on it until the hold is released.
There was a problem hiding this comment.
It would be ok for user-factory to take on the responsibility of assigning a list of system resources as defined in some pool and required by some slot. The workaround I'm using right now is to set the devices up as owned by a group. See the note elsewhere about additional groups not being set in the current commit.
There was a problem hiding this comment.
Resources include (all optional depending on the testbed): SP probe, RoT probe, USB to serial device, power control, IPv6 network access to an SP. We could add logic probes or other devices as well in certain cases.
There was a problem hiding this comment.
The workaround I'm using right now is to set the devices up as owned by a group.
That's the core of my design for the user factory. For Hubris CI those resources are required, yes, but other uses of the factory in the future might need different resources and I kinda don't want to keep expanding the set of devices the factory understands.
| #[serde(default)] | ||
| pub(crate) add_to_groups: Vec<String>, | ||
| #[serde(default)] | ||
| pub(crate) env: HashMap<String, String>, |
There was a problem hiding this comment.
Do you have an example configuration that includes all the environment variables you'd be specifying through this mechanism?
a9266cd to
f39679b
Compare
2683833 to
f2bcec5
Compare
f39679b to
4dc2397
Compare
|
Group id issue for USB device ownership: With a slot's add_to_groups = ['staff'], the ephemeral worker process gets EACCES on
Fix: a pre_exec hook calling libc::initgroups(user, primary_gid) just before exec in Workaround: install everything at world-traversable system paths (/opt/...) so workers don't |
Co-Authored-By: Joshua M. Clulow <jmc@oxide.computer>
4dc2397 to
290b367
Compare
This PR implements a new factory for buildomat, spawning jobs in ephemeral users in the same host system running the factory. Documentation on how to use the factory is available in the factory README.
I was careful during the implementation of the factory to make sure it will always attempt to clean up after itself (never releasing a slot until the cleanup stage finishes) and that it alerts the operator when something goes wrong (by failing the worker, which triggers an hold on it: I plan in the future to add monitoring for held workers).
This PR also makes multiple changes to the agent installation to support this, each in its separate commit. I can move those to a single separate PR or multiple separate PRs if you'd prefer.
The implementation of the factory was based on @jclulow's 2024 work on a work-in-progress hubris factory.