caseymcc · caseymcc · Apr 13, 2026 · Apr 12, 2026 · Apr 12, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -21,11 +21,12 @@ A C++17 library providing a unified interface for multiple LLM providers.
 
 1. **All commands** must go through `./runDocker.sh ...`.
 2. **All development** (building, testing, running) must be done inside the Docker container. The host environment is not guaranteed to have the correct tools or dependencies.
-3. **Do not** use `python`, `pip`, `pytest` — the host may not have the correct Python version or dependencies.
-4. **Do not** create or use a virtualenv on the host. The container is the virtualenv.
-5. The project source is **bind-mounted** at `/app` inside the container. Edits to files on the host are immediately visible inside the container.
-6. If you change the `Dockerfile`, run `./runDocker.sh --rebuild`.
-7. Don't launch the server, ask the user to launch so that its not running in the agents terminal.
+3. **Do not** run commands in the terminal with `2>&1` as the user cannot verify the command is running or not.
+4. **Do not** use `python`, `pip`, `pytest` — the host may not have the correct Python version or dependencies.
+5. **Do not** create or use a virtualenv on the host. The container is the virtualenv.
+6. The project source is **bind-mounted** at `/app` inside the container. Edits to files on the host are immediately visible inside the container.
+7. If you change the `Dockerfile`, run `./runDocker.sh --rebuild`.
+8. Don't launch the server, ask the user to launch so that its not running in the agents terminal.
 
 ## Active Tasks
 

diff --git a/.gitignore b/.gitignore
@@ -35,9 +35,6 @@
 build/
 vcpkg_installed/
 
-# Cloned config repository
-arbiterAI_config/
-
 # Generated at build time by CMake
 **/generated/
 

diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "arbiterAI_config"]
+	path = arbiterAI_config
+	url = https://github.com/caseymcc/arbiterAI_config.git
diff --git a/arbiterAI_config b/arbiterAI_config
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -1,11 +1,12 @@
 # syntax=docker/dockerfile:1
-ARG DOCKER_VERSION=1.2.0
+ARG DOCKER_VERSION=1.2.1
 FROM ubuntu:24.04
 
 # Install basic build tools, Python 3, and GPU libraries.
 # Vulkan headers + glslc are needed at build time for llama.cpp's Vulkan backend.
-# At runtime, GPU inference requires a Vulkan ICD on the host; without one
-# llama.cpp falls back to CPU-only.
+# ROCm HIP SDK is needed at build time for llama.cpp's HIP/ROCm backend.
+# At runtime, GPU inference requires a Vulkan ICD or ROCm driver on the host;
+# without one llama.cpp falls back to CPU-only.
 RUN apt-get update && apt-get install -y \
     build-essential \
     cmake \
@@ -34,8 +35,25 @@ RUN apt-get update && apt-get install -y \
     mesa-vulkan-drivers \
     glslc \
     glslang-tools \
+    wget \
     && rm -rf /var/lib/apt/lists/*
 
+# Install ROCm HIP SDK (build-time only — no kernel driver needed in container)
+# Use hiplibsdk usecase to get HIP development libraries and CMake configs
+# (the plain "hip" usecase only installs runtime, missing hip-lang-config.cmake)
+RUN wget -q https://repo.radeon.com/amdgpu-install/7.2.1/ubuntu/noble/amdgpu-install_7.2.1.70201-1_all.deb && \
+    apt-get update && \
+    apt-get install -y ./amdgpu-install_7.2.1.70201-1_all.deb && \
+    apt-get update && \
+    amdgpu-install -y --usecase=hiplibsdk --no-dkms && \
+    rm -f amdgpu-install_7.2.1.70201-1_all.deb && \
+    rm -rf /var/lib/apt/lists/*
+
+ENV ROCM_PATH=/opt/rocm
+ENV HIP_PATH=/opt/rocm
+ENV PATH="${ROCM_PATH}/bin:${PATH}"
+ENV CMAKE_PREFIX_PATH="${ROCM_PATH}:${CMAKE_PREFIX_PATH}"
+
 RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
     apt-get install -y nodejs && \
     rm -rf /var/lib/apt/lists/*

diff --git a/docs/server.md b/docs/server.md
@@ -77,7 +77,8 @@ All server settings are defined in a JSON configuration file. See [`examples/ser
     "hardware": {
         "vram_overrides": {
             "0": 32000
-        }
+        },
+        "default_backend_priority": ["vulkan"]
     },
     "logging": {
         "level": "info",
@@ -116,6 +117,7 @@ All server settings are defined in a JSON configuration file. See [`examples/ser
 | Field | Type | Default | Description |
 |-------|------|---------|-------------|
 | `vram_overrides` | `object` | `{}` | GPU index → VRAM MB overrides (e.g., `{"0": 32000}`) |
+| `default_backend_priority` | `string[]` | `[]` | Default GPU backend preference for models without their own `backend_priority` (e.g., `["vulkan"]`). Empty = all backends. |
 
 **`logging` object:**
 

diff --git a/examples/server_config.json b/examples/server_config.json
@@ -24,7 +24,8 @@
     },
 
     "hardware": {
-        "vram_overrides": {}
+        "vram_overrides": {},
+        "default_backend_priority": []
     },
 
     "logging": {

diff --git a/schemas/model_config.schema.json b/schemas/model_config.schema.json
@@ -247,6 +247,58 @@
                 }
               }
             }
+          },
+          "runtime_options": {
+            "type": "object",
+            "description": "Runtime options for llama.cpp model loading and inference. Applied as defaults; can be overridden at load time via the API.",
+            "properties": {
+              "flash_attn": {
+                "type": "boolean",
+                "description": "Enable or disable flash attention (-fa)"
+              },
+              "kv_cache_type_k": {
+                "type": "string",
+                "description": "KV cache data type for keys (-ctk)",
+                "enum": ["f32", "f16", "bf16", "q8_0", "q4_0", "q4_1", "q5_0", "q5_1"]
+              },
+              "kv_cache_type_v": {
+                "type": "string",
+                "description": "KV cache data type for values (-ctv)",
+                "enum": ["f32", "f16", "bf16", "q8_0", "q4_0", "q4_1", "q5_0", "q5_1"]
+              },
+              "no_mmap": {
+                "type": "boolean",
+                "description": "Disable memory-mapped file I/O (--no-mmap)"
+              },
+              "reasoning_budget": {
+                "type": "integer",
+                "description": "Reasoning token budget (--reasoning-budget). 0 disables reasoning tokens.",
+                "minimum": 0
+              },
+              "swa_full": {
+                "type": "boolean",
+                "description": "Use full-size sliding window attention cache (--swa-full)"
+              },
+              "n_gpu_layers": {
+                "type": "integer",
+                "description": "Number of layers to offload to GPU (-ngl). 99 offloads all.",
+                "minimum": 0
+              },
+              "override_tensor": {
+                "type": "string",
+                "description": "Tensor override pattern (-ot) for routing tensors to CPU/GPU"
+              }
+            },
+            "additionalProperties": false
+          },
+          "backend_priority": {
+            "type": "array",
+            "description": "Ordered preference for GPU backends. First available backend is used.",
+            "items": {
+              "type": "string",
+              "enum": ["vulkan", "rocm", "cuda"]
+            },
+            "uniqueItems": true
           }
         }
       }

diff --git a/src/arbiterAI/arbiterAI.cpp b/src/arbiterAI/arbiterAI.cpp
@@ -532,9 +532,15 @@ ErrorCode ArbiterAI::getAvailableModels(std::vector<std::string>& models)
 
 // ========== Local Model Management ==========
 
-ErrorCode ArbiterAI::loadModel(const std::string &model, const std::string &variant, int contextSize)
+ErrorCode ArbiterAI::loadModel(const std::string &model, const std::string &variant, int contextSize,
+    const RuntimeOptions *optionsOverride)
 {
-    return ModelRuntime::instance().loadModel(model, variant, contextSize);
+    RuntimeOptions opts;
+    if(optionsOverride)
+    {
+        opts=*optionsOverride;
+    }
+    return ModelRuntime::instance().loadModel(model, variant, contextSize, opts);
 }
 
 ErrorCode ArbiterAI::downloadModel(const std::string &model, const std::string &variant)

diff --git a/src/arbiterAI/arbiterAI.h b/src/arbiterAI/arbiterAI.h
@@ -34,6 +34,7 @@ struct ModelFit;
 struct LoadedModel;
 struct SystemSnapshot;
 struct InferenceStats;
+struct RuntimeOptions;
 
 /**
  * @struct VersionInfo
@@ -241,6 +242,42 @@ inline void from_json(const nlohmann::json &j, ToolCall &t)
  * - "assistant": may include tool_calls when the model invokes tools
  * - "tool": includes tool_call_id linking the result back to a specific tool call
  */
+
+/// Extract text from an OpenAI `content` field.
+/// The spec allows content as either a plain string or an array of content
+/// parts (e.g. [{"type":"text","text":"..."},{"type":"image_url",...}]).
+/// This helper concatenates all "text" parts and ignores non-text entries.
+inline std::string contentToString(const nlohmann::json &contentJson)
+{
+    if(contentJson.is_string())
+        return contentJson.get<std::string>();
+
+    if(contentJson.is_array())
+    {
+        std::string result;
+        for(const nlohmann::json &part:contentJson)
+        {
+            if(part.is_string())
+            {
+                if(!result.empty()) result+=' ';
+                result+=part.get<std::string>();
+            }
+            else if(part.is_object()
+                && part.contains("type")
+                && part.at("type").get<std::string>()=="text"
+                && part.contains("text"))
+            {
+                if(!result.empty()) result+=' ';
+                result+=part.at("text").get<std::string>();
+            }
+            // Skip non-text parts (image_url, etc.)
+        }
+        return result;
+    }
+
+    return {};
+}
+
 struct Message
 {
     std::string role;
@@ -262,7 +299,7 @@ inline void from_json(const nlohmann::json &j, Message &m)
 {
     j.at("role").get_to(m.role);
     if(j.contains("content") && !j.at("content").is_null())
-        j.at("content").get_to(m.content);
+        m.content=contentToString(j.at("content"));
     if(j.contains("tool_call_id"))
         m.toolCallId=j.at("tool_call_id").get<std::string>();
     if(j.contains("tool_calls"))
@@ -605,9 +642,11 @@ class ArbiterAI
      * @param model Model name
      * @param variant Quantization variant (empty = auto-select)
      * @param contextSize Context size (0 = model default)
+     * @param optionsOverride Optional runtime options to merge on top of model config defaults (nullptr = use config defaults)
      * @return ErrorCode indicating success, ModelDownloading, or failure
      */
-    ErrorCode loadModel(const std::string &model, const std::string &variant="", int contextSize=0);
+    ErrorCode loadModel(const std::string &model, const std::string &variant="", int contextSize=0,
+        const RuntimeOptions *optionsOverride=nullptr);
 
     /**
      * @brief Download model files without loading into VRAM

diff --git a/src/arbiterAI/configDownloader.cpp b/src/arbiterAI/configDownloader.cpp
@@ -195,20 +195,24 @@ ConfigDownloadStatus ConfigDownloader::checkoutVersion()
 
     spdlog::info("Checking out version: {}", m_version);
 
-    // Try to resolve the version as a direct ref, remote branch, or tag
+    // Try to resolve the version — prefer remote branch refs first so that
+    // a fetch+checkout always picks up the latest remote commit rather than
+    // a stale local branch ref that was never fast-forwarded.
     git_object *obj=nullptr;
-    error=git_revparse_single(&obj, repo, m_version.c_str());
+
+    // 1. Try as a remote-tracking branch (most common path after fetch)
+    std::string remoteBranch="refs/remotes/origin/"+m_version;
+    error=git_revparse_single(&obj, repo, remoteBranch.c_str());
 
     if(error!=0)
     {
-        // Try as a remote branch
-        std::string remoteBranch="refs/remotes/origin/"+m_version;
-        error=git_revparse_single(&obj, repo, remoteBranch.c_str());
+        // 2. Try as a direct ref / local branch / SHA
+        error=git_revparse_single(&obj, repo, m_version.c_str());
     }
 
     if(error!=0)
     {
-        // Try as a tag
+        // 3. Try as a tag
         std::string tag="refs/tags/"+m_version;
         error=git_revparse_single(&obj, repo, tag.c_str());
     }