html file to demo multimodal by ericcurtin · Pull Request #199 · docker/model-runner

ericcurtin · 2025-10-07T20:57:12Z

Demos llama.cpp multimodal support with a webcam.

Summary by Sourcery

Add a new real-time webcam-based multimodal demo for llama.cpp with UI controls and API interaction, along with accompanying documentation.

New Features:

Create demo.html to capture webcam feed, display video, and capture frames for model analysis
Implement UI controls for base API URL, instruction input, response display, request interval selection, and start/stop functionality
Integrate image capture and send requests to llama.cpp chat completion endpoint with embedded images

Documentation:

Add README.md with prerequisites, setup instructions for Docker Model Runner and SmolVLM model, and usage guide for the demo

sourcery-ai · 2025-10-07T20:57:18Z

Reviewer's Guide

Introduces a standalone HTML-based demo that captures live webcam frames, embeds them with user text instructions into multimodal chat completion requests, and streams model responses in real time via a configurable UI, accompanied by detailed setup and usage documentation.

File-Level Changes

Change	Details	Files
Add standalone HTML demo for multimodal webcam interaction	Define video and canvas elements for live capture Build UI controls for API URL, instructions, interval and start/stop Implement initCamera to request and display webcam stream Implement capture and toDataURL conversion of frames Implement sendChatCompletionRequest to send image+text payloads Implement sendData loop with interval handling and response display Implement start/stop handlers toggling processing state Apply CSS styling for layout and responsive controls	`demos/multimodal/demo.html`
Add documentation for real-time webcam vision model demo	Outline prerequisites including Docker Model Runner and model pull Provide both Docker Desktop and standalone setup instructions Detail demo usage steps: open HTML, grant camera, configure UI, start/stop Include credits and links to original demo and relevant resources	`demos/multimodal/README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2025-10-07T20:57:26Z

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a standalone HTML file that creates a web application to showcase the multimodal support within llama.cpp. The application captures images from a user's webcam and sends them, along with a text prompt, to a local API endpoint. This allows for an interactive demonstration of how llama.cpp can process both visual and textual input to generate responses, providing a practical example of its capabilities.

Highlights

New Multimodal Demo Application: Introduces a new HTML file, assets/camera-demo.html, which serves as a web-based application to demonstrate llama.cpp's multimodal capabilities.
Webcam Integration: The application utilizes the user's webcam to capture live video frames, converting them into base64 encoded JPEG images for API submission.
Real-time API Interaction: It continuously sends captured images and a user-defined text instruction to a local llama.cpp API endpoint (defaulting to http://localhost:8080/v1/chat/completions) at a configurable interval, displaying the AI's response in real-time.
User Interface: Provides a simple UI to control the camera feed, set the API base URL, input instructions, view responses, and adjust the request interval.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR introduces an HTML demo application that showcases multimodal capabilities by integrating webcam functionality with llama.cpp. The demo allows users to capture webcam images and send them with text instructions to a local API endpoint for AI-powered image analysis.

Adds interactive webcam capture functionality with real-time streaming
Implements API integration to send multimodal requests (text + images) to llama.cpp
Provides configurable request intervals and user-friendly controls for starting/stopping the demo

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

ericcurtin · 2025-10-07T20:57:30Z

This isn't ready yet, but we can continue to iterate on it.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Fix the malformed baseURL input element that is closed with </textarea> — input elements shouldn’t have a </textarea> closing tag.
isProcessing is used both to track run state and in-flight requests, so overlapping sendData calls can still happen; introduce a separate flag or lock to prevent concurrent fetches.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Fix the malformed baseURL input element that is closed with </textarea> — input elements shouldn’t have a </textarea> closing tag.
- isProcessing is used both to track run state and in-flight requests, so overlapping sendData calls can still happen; introduce a separate flag or lock to prevent concurrent fetches.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces an HTML file for a webcam-based multimodal demo. The implementation is a good starting point, but I've identified a few significant issues related to HTML validity, potential race conditions with API requests, and unsafe handling of API responses. My review includes suggestions to correct an HTML syntax error, prevent request storms by implementing a locking mechanism, and improve the robustness of response handling to avoid runtime errors.

Demos llama.cpp multimodal support with a webcam. Signed-off-by: Eric Curtin <eric.curtin@docker.com>

doringeman · 2025-10-08T09:48:52Z

@ericcurtin let us know when it's ready for review.

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

…imodal input, so most models will throw an error if used here.

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-08T18:18:45Z

+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                model: 'ai/smolvlm:500M-Q8_0',


The model name is hardcoded. Consider making it configurable through the UI or at least adding a comment explaining why this specific model is used.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-08T18:28:25Z

+                'Content-Type': 'application/json'
+            },
+            body: JSON.stringify({
+                model: 'ai/smolvlm:500M-Q8_0',


[nitpick] The model name is hardcoded in the JavaScript. Consider making this configurable via a form input to allow users to test with different models without modifying the code.

Copilot · 2025-10-08T18:28:25Z

+            },
+            body: JSON.stringify({
+                model: 'ai/smolvlm:500M-Q8_0',
+                max_tokens: 100,


[nitpick] The max_tokens value is hardcoded. Consider making this configurable to allow users to adjust response length based on their needs.

What do you prefer?

hardcoded model

input with default value

model selector

The issue with model selector is that there are few models that support multimodal, so most models will throw an error if used here.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider moving the inline CSS and JavaScript into separate files (e.g. demo.css and demo.js) to improve maintainability and reduce page clutter.
Make the model name, max_tokens, and exact API endpoint path configurable from the UI rather than hard-coding them in sendChatCompletionRequest so you can easily swap models or endpoints without editing the HTML.
Add more robust error and state handling around camera initialization and fetch failures (invalid Base URL, CORS errors, network timeouts) to surface clearer user feedback and avoid silent skips.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider moving the inline CSS and JavaScript into separate files (e.g. demo.css and demo.js) to improve maintainability and reduce page clutter.
- Make the model name, max_tokens, and exact API endpoint path configurable from the UI rather than hard-coding them in sendChatCompletionRequest so you can easily swap models or endpoints without editing the HTML.
- Add more robust error and state handling around camera initialization and fetch failures (invalid Base URL, CORS errors, network timeouts) to surface clearer user feedback and avoid silent skips.

## Individual Comments

### Comment 1
<location> `demos/multimodal/demo.html:119-120` </location>
<code_context>
+
+    let stream;
+    let intervalId;
+    let isProcessing = false;
+    let isWaitingForResponse = false;
+
+    // Returns response text (string)
</code_context>

<issue_to_address>
**issue (bug_risk):** Concurrent request control may not prevent overlapping requests reliably.

Since isWaitingForResponse is set before fetch and reset after, multiple requests can still be sent if sendData is triggered again before the previous fetch resolves. Consider tracking the active promise or pausing the interval until the response is received to ensure only one request is in flight.
</issue_to_address>

### Comment 2
<location> `demos/multimodal/demo.html:148` </location>
<code_context>
+            return `Server error: ${response.status} - ${errorData}`;
+        }
+        const data = await response.json();
+        return data.choices[0].message.content;
+    }
+
</code_context>

<issue_to_address>
**suggestion:** No error handling for unexpected API response structure.

Add checks to verify that data.choices and data.choices[0].message.content exist before accessing them, and return a fallback error message if they are missing.

```suggestion
        const data = await response.json();
        if (
            !data.choices ||
            !Array.isArray(data.choices) ||
            !data.choices[0] ||
            !data.choices[0].message ||
            typeof data.choices[0].message.content !== "string"
        ) {
            return "API response error: Unexpected response structure.";
        }
```
</issue_to_address>

### Comment 3
<location> `demos/multimodal/demo.html:166` </location>
<code_context>
+    }
+
+    function captureImage() {
+        if (!stream || !video.videoWidth) {
+            console.warn("Video stream not ready for capture.");
+            return null;
</code_context>

<issue_to_address>
**nitpick:** Warning message for video not ready may be too generic.

Consider providing distinct warnings for a missing stream and for video.videoWidth being zero to aid in troubleshooting.
</issue_to_address>

### Comment 4
<location> `demos/multimodal/demo.html:185-188` </location>
<code_context>
+        const instruction = instructionText.value;
+        const imageBase64URL = captureImage();
+
+        if (!imageBase64URL) {
+            responseText.value = "Failed to capture image. Stream might not be active.";
+            // Optionally stop processing if image capture fails consistently
+            // handleStop();
+            return;
+        }
</code_context>

<issue_to_address>
**suggestion:** Commented-out code for stopping processing on image capture failure should be removed or handled.

If you intend to stop processing after repeated failures, implement a retry or error threshold. Otherwise, remove the commented-out 'handleStop();' to avoid confusion.

```suggestion
            responseText.value = "Failed to capture image. Stream might not be active.";
            return;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

ilopezluna · 2025-10-09T09:25:18Z

@ericcurtin @p1-0tr @doringeman
What do you prefer?

hardcoded model
input with default value
model selector

The issue with model selector is that there are few models that support multimodal, so most models will throw an error if used here.

p1-0tr · 2025-10-09T09:34:09Z

I vote option 2 (input with sensible default)

ericcurtin · 2025-10-09T09:56:12Z

I would say 3 (but you can leave a little note that this application will only work with multi-modal compatible models).

The problem with sensible default for me is... What is the sensible default? (the sensible default that springs to my head is "") because we don't know what models are available.

doringeman · 2025-10-09T12:15:57Z

I would vote for 3 and also add a link to Hub with a filter for multimodal models.

ilopezluna · 2025-10-09T13:10:58Z

Thanks! so I finally used a model selector:

if the Smolvlm model is found is selected by default
added recommendation to get more models from Hub
if model does not support multimodal handle error and show a helpful message

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-09T13:35:31Z

+                // Select the first model
+                modelSelect.value = models[0];
+                // Show suggestion about recommended model
+                modelWarning.innerHTML = `💡 We recommend to run this demo with <code>${RECOMMENDED_MODEL}</code>`;


Using innerHTML with template literals containing variables can lead to XSS if the variable contains malicious content. Since RECOMMENDED_MODEL is a constant, this is lower risk, but consider using textContent or proper HTML escaping for consistency.

Suggested change

modelWarning.innerHTML = `💡 We recommend to run this demo with <code>${RECOMMENDED_MODEL}</code>`;

// Safely set the warning message without using innerHTML

modelWarning.textContent = ""; // Clear previous content

modelWarning.append("💡 We recommend to run this demo with ");

const codeElem = document.createElement("code");

codeElem.textContent = RECOMMENDED_MODEL;

modelWarning.appendChild(codeElem);

Copilot AI review requested due to automatic review settings October 7, 2025 20:57

Copilot AI reviewed Oct 7, 2025

View reviewed changes

Comment thread assets/camera-demo.html Outdated

sourcery-ai Bot reviewed Oct 7, 2025

View reviewed changes

gemini-code-assist Bot reviewed Oct 7, 2025

View reviewed changes

Comment thread assets/camera-demo.html Outdated

Comment thread assets/camera-demo.html Outdated

Comment thread assets/camera-demo.html Outdated

html file to demo multimodal

fecff1e

Demos llama.cpp multimodal support with a webcam. Signed-off-by: Eric Curtin <eric.curtin@docker.com>

ericcurtin force-pushed the add-multimodal-demo branch from d9e9175 to fecff1e Compare October 7, 2025 21:00

ilopezluna added 3 commits October 8, 2025 16:15

Merge branch 'main' into add-multimodal-demo

655f1e1

chore: rename camera-demo.html for improved accessibility

940282c

add readme and include model in the request

5a0263d

Copilot AI review requested due to automatic review settings October 8, 2025 14:47

Copilot AI reviewed Oct 8, 2025

View reviewed changes

Comment thread demos/multimodal/demo.html

Update demos/multimodal/demo.html

94e8eff

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings October 8, 2025 14:50

Copilot AI reviewed Oct 8, 2025

View reviewed changes

Comment thread demos/multimodal/demo.html Outdated

Comment thread demos/multimodal/demo.html Outdated

ilopezluna added 2 commits October 8, 2025 17:00

revert wrong fix

0aea05b

feat: add model selection and fetch functionality to demo

4e98f99

ilopezluna marked this pull request as draft October 8, 2025 15:50

Copilot AI review requested due to automatic review settings October 8, 2025 15:50

Copilot AI reviewed Oct 8, 2025

View reviewed changes

Comment thread demos/multimodal/demo.html Outdated

Comment thread demos/multimodal/demo.html Outdated

Comment thread demos/multimodal/demo.html Outdated

ilopezluna added 2 commits October 8, 2025 18:02

use /models to list models

bd91ddf

Revert to a single model. We have only a few models that support mult…

1d8c7e8

…imodal input, so most models will throw an error if used here.

Copilot AI review requested due to automatic review settings October 8, 2025 18:18

Copilot AI reviewed Oct 8, 2025

View reviewed changes

ilopezluna and others added 2 commits October 8, 2025 20:25

feat: prevent overlapping requests by managing response state

3f1bc79

Update demos/multimodal/demo.html

a623618

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings October 8, 2025 18:27

Update demos/multimodal/demo.html

87c8274

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Oct 8, 2025

View reviewed changes

ilopezluna marked this pull request as ready for review October 8, 2025 18:33

ilopezluna requested a review from a team October 8, 2025 18:34

sourcery-ai Bot reviewed Oct 8, 2025

View reviewed changes

Comment thread demos/multimodal/demo.html

Comment thread demos/multimodal/demo.html

Comment thread demos/multimodal/demo.html

Comment thread demos/multimodal/demo.html

Merge branch 'main' into add-multimodal-demo

8aa769e

feat(demo): enhance model selection with warnings and info messages

e777478

Update demos/multimodal/demo.html

4b33c57

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings October 9, 2025 13:33

Copilot AI reviewed Oct 9, 2025

View reviewed changes

ilopezluna approved these changes Oct 9, 2025

View reviewed changes

ilopezluna merged commit 6e65da4 into main Oct 9, 2025
5 checks passed

ilopezluna deleted the add-multimodal-demo branch October 9, 2025 14:26

-                modelWarning.innerHTML = `💡 We recommend to run this demo with <code>${RECOMMENDED_MODEL}</code>`;
+                // Safely set the warning message without using innerHTML
+                modelWarning.textContent = ""; // Clear previous content
+                modelWarning.append("💡 We recommend to run this demo with ");
+                const codeElem = document.createElement("code");
+                codeElem.textContent = RECOMMENDED_MODEL;
+                modelWarning.appendChild(codeElem);

Conversation

ericcurtin commented Oct 7, 2025 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist Bot commented Oct 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

ericcurtin commented Oct 7, 2025

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

doringeman commented Oct 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ilopezluna Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

ericcurtin commented Oct 7, 2025 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Oct 7, 2025 •

edited

Loading

ericcurtin commented Oct 9, 2025 •

edited

Loading