Skip to content

html file to demo multimodal#199

Merged
ilopezluna merged 15 commits intomainfrom
add-multimodal-demo
Oct 9, 2025
Merged

html file to demo multimodal#199
ilopezluna merged 15 commits intomainfrom
add-multimodal-demo

Conversation

@ericcurtin
Copy link
Copy Markdown
Contributor

@ericcurtin ericcurtin commented Oct 7, 2025

Demos llama.cpp multimodal support with a webcam.

Summary by Sourcery

Add a new real-time webcam-based multimodal demo for llama.cpp with UI controls and API interaction, along with accompanying documentation.

New Features:

  • Create demo.html to capture webcam feed, display video, and capture frames for model analysis
  • Implement UI controls for base API URL, instruction input, response display, request interval selection, and start/stop functionality
  • Integrate image capture and send requests to llama.cpp chat completion endpoint with embedded images

Documentation:

  • Add README.md with prerequisites, setup instructions for Docker Model Runner and SmolVLM model, and usage guide for the demo

Copilot AI review requested due to automatic review settings October 7, 2025 20:57
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Oct 7, 2025

Reviewer's Guide

Introduces a standalone HTML-based demo that captures live webcam frames, embeds them with user text instructions into multimodal chat completion requests, and streams model responses in real time via a configurable UI, accompanied by detailed setup and usage documentation.

File-Level Changes

Change Details Files
Add standalone HTML demo for multimodal webcam interaction
  • Define video and canvas elements for live capture
  • Build UI controls for API URL, instructions, interval and start/stop
  • Implement initCamera to request and display webcam stream
  • Implement capture and toDataURL conversion of frames
  • Implement sendChatCompletionRequest to send image+text payloads
  • Implement sendData loop with interval handling and response display
  • Implement start/stop handlers toggling processing state
  • Apply CSS styling for layout and responsive controls
demos/multimodal/demo.html
Add documentation for real-time webcam vision model demo
  • Outline prerequisites including Docker Model Runner and model pull
  • Provide both Docker Desktop and standalone setup instructions
  • Detail demo usage steps: open HTML, grant camera, configure UI, start/stop
  • Include credits and links to original demo and relevant resources
demos/multimodal/README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a standalone HTML file that creates a web application to showcase the multimodal support within llama.cpp. The application captures images from a user's webcam and sends them, along with a text prompt, to a local API endpoint. This allows for an interactive demonstration of how llama.cpp can process both visual and textual input to generate responses, providing a practical example of its capabilities.

Highlights

  • New Multimodal Demo Application: Introduces a new HTML file, assets/camera-demo.html, which serves as a web-based application to demonstrate llama.cpp's multimodal capabilities.
  • Webcam Integration: The application utilizes the user's webcam to capture live video frames, converting them into base64 encoded JPEG images for API submission.
  • Real-time API Interaction: It continuously sends captured images and a user-defined text instruction to a local llama.cpp API endpoint (defaulting to http://localhost:8080/v1/chat/completions) at a configurable interval, displaying the AI's response in real-time.
  • User Interface: Provides a simple UI to control the camera feed, set the API base URL, input instructions, view responses, and adjust the request interval.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an HTML demo application that showcases multimodal capabilities by integrating webcam functionality with llama.cpp. The demo allows users to capture webcam images and send them with text instructions to a local API endpoint for AI-powered image analysis.

  • Adds interactive webcam capture functionality with real-time streaming
  • Implements API integration to send multimodal requests (text + images) to llama.cpp
  • Provides configurable request intervals and user-friendly controls for starting/stopping the demo

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread assets/camera-demo.html Outdated
@ericcurtin
Copy link
Copy Markdown
Contributor Author

This isn't ready yet, but we can continue to iterate on it.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Fix the malformed baseURL input element that is closed with </textarea> — input elements shouldn’t have a </textarea> closing tag.
  • isProcessing is used both to track run state and in-flight requests, so overlapping sendData calls can still happen; introduce a separate flag or lock to prevent concurrent fetches.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Fix the malformed baseURL input element that is closed with </textarea> — input elements shouldn’t have a </textarea> closing tag.
- isProcessing is used both to track run state and in-flight requests, so overlapping sendData calls can still happen; introduce a separate flag or lock to prevent concurrent fetches.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an HTML file for a webcam-based multimodal demo. The implementation is a good starting point, but I've identified a few significant issues related to HTML validity, potential race conditions with API requests, and unsafe handling of API responses. My review includes suggestions to correct an HTML syntax error, prevent request storms by implementing a locking mechanism, and improve the robustness of response handling to avoid runtime errors.

Comment thread assets/camera-demo.html Outdated
Comment thread assets/camera-demo.html Outdated
Comment thread assets/camera-demo.html Outdated
Demos llama.cpp multimodal support with a webcam.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
@ericcurtin ericcurtin force-pushed the add-multimodal-demo branch from d9e9175 to fecff1e Compare October 7, 2025 21:00
@doringeman
Copy link
Copy Markdown
Contributor

@ericcurtin let us know when it's ready for review.

Copilot AI review requested due to automatic review settings October 8, 2025 14:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings October 8, 2025 14:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html Outdated
Comment thread demos/multimodal/demo.html Outdated
@ilopezluna ilopezluna marked this pull request as draft October 8, 2025 15:50
Copilot AI review requested due to automatic review settings October 8, 2025 15:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html Outdated
Comment thread demos/multimodal/demo.html Outdated
Comment thread demos/multimodal/demo.html Outdated
Copilot AI review requested due to automatic review settings October 8, 2025 18:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html Outdated
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'ai/smolvlm:500M-Q8_0',
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name is hardcoded. Consider making it configurable through the UI or at least adding a comment explaining why this specific model is used.

Copilot uses AI. Check for mistakes.
Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html Outdated
ilopezluna and others added 2 commits October 8, 2025 20:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings October 8, 2025 18:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html Outdated
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'ai/smolvlm:500M-Q8_0',
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The model name is hardcoded in the JavaScript. Consider making this configurable via a form input to allow users to test with different models without modifying the code.

Copilot uses AI. Check for mistakes.
},
body: JSON.stringify({
model: 'ai/smolvlm:500M-Q8_0',
max_tokens: 100,
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The max_tokens value is hardcoded. Consider making this configurable to allow users to adjust response length based on their needs.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you prefer?

  1. hardcoded model
  2. input with default value
  3. model selector

The issue with model selector is that there are few models that support multimodal, so most models will throw an error if used here.

Comment thread demos/multimodal/demo.html
@ilopezluna ilopezluna marked this pull request as ready for review October 8, 2025 18:33
@ilopezluna ilopezluna requested a review from a team October 8, 2025 18:34
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider moving the inline CSS and JavaScript into separate files (e.g. demo.css and demo.js) to improve maintainability and reduce page clutter.
  • Make the model name, max_tokens, and exact API endpoint path configurable from the UI rather than hard-coding them in sendChatCompletionRequest so you can easily swap models or endpoints without editing the HTML.
  • Add more robust error and state handling around camera initialization and fetch failures (invalid Base URL, CORS errors, network timeouts) to surface clearer user feedback and avoid silent skips.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider moving the inline CSS and JavaScript into separate files (e.g. demo.css and demo.js) to improve maintainability and reduce page clutter.
- Make the model name, max_tokens, and exact API endpoint path configurable from the UI rather than hard-coding them in sendChatCompletionRequest so you can easily swap models or endpoints without editing the HTML.
- Add more robust error and state handling around camera initialization and fetch failures (invalid Base URL, CORS errors, network timeouts) to surface clearer user feedback and avoid silent skips.

## Individual Comments

### Comment 1
<location> `demos/multimodal/demo.html:119-120` </location>
<code_context>
+
+    let stream;
+    let intervalId;
+    let isProcessing = false;
+    let isWaitingForResponse = false;
+
+    // Returns response text (string)
</code_context>

<issue_to_address>
**issue (bug_risk):** Concurrent request control may not prevent overlapping requests reliably.

Since isWaitingForResponse is set before fetch and reset after, multiple requests can still be sent if sendData is triggered again before the previous fetch resolves. Consider tracking the active promise or pausing the interval until the response is received to ensure only one request is in flight.
</issue_to_address>

### Comment 2
<location> `demos/multimodal/demo.html:148` </location>
<code_context>
+            return `Server error: ${response.status} - ${errorData}`;
+        }
+        const data = await response.json();
+        return data.choices[0].message.content;
+    }
+
</code_context>

<issue_to_address>
**suggestion:** No error handling for unexpected API response structure.

Add checks to verify that data.choices and data.choices[0].message.content exist before accessing them, and return a fallback error message if they are missing.

```suggestion
        const data = await response.json();
        if (
            !data.choices ||
            !Array.isArray(data.choices) ||
            !data.choices[0] ||
            !data.choices[0].message ||
            typeof data.choices[0].message.content !== "string"
        ) {
            return "API response error: Unexpected response structure.";
        }
```
</issue_to_address>

### Comment 3
<location> `demos/multimodal/demo.html:166` </location>
<code_context>
+    }
+
+    function captureImage() {
+        if (!stream || !video.videoWidth) {
+            console.warn("Video stream not ready for capture.");
+            return null;
</code_context>

<issue_to_address>
**nitpick:** Warning message for video not ready may be too generic.

Consider providing distinct warnings for a missing stream and for video.videoWidth being zero to aid in troubleshooting.
</issue_to_address>

### Comment 4
<location> `demos/multimodal/demo.html:185-188` </location>
<code_context>
+        const instruction = instructionText.value;
+        const imageBase64URL = captureImage();
+
+        if (!imageBase64URL) {
+            responseText.value = "Failed to capture image. Stream might not be active.";
+            // Optionally stop processing if image capture fails consistently
+            // handleStop();
+            return;
+        }
</code_context>

<issue_to_address>
**suggestion:** Commented-out code for stopping processing on image capture failure should be removed or handled.

If you intend to stop processing after repeated failures, implement a retry or error threshold. Otherwise, remove the commented-out 'handleStop();' to avoid confusion.

```suggestion
            responseText.value = "Failed to capture image. Stream might not be active.";
            return;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
@ilopezluna
Copy link
Copy Markdown
Contributor

@ericcurtin @p1-0tr @doringeman
What do you prefer?

  1. hardcoded model
  2. input with default value
  3. model selector

The issue with model selector is that there are few models that support multimodal, so most models will throw an error if used here.

@p1-0tr
Copy link
Copy Markdown

p1-0tr commented Oct 9, 2025

I vote option 2 (input with sensible default)

@ericcurtin
Copy link
Copy Markdown
Contributor Author

ericcurtin commented Oct 9, 2025

I would say 3 (but you can leave a little note that this application will only work with multi-modal compatible models).

The problem with sensible default for me is... What is the sensible default? (the sensible default that springs to my head is "") because we don't know what models are available.

@doringeman
Copy link
Copy Markdown
Contributor

I would vote for 3 and also add a link to Hub with a filter for multimodal models.

@ilopezluna
Copy link
Copy Markdown
Contributor

Thanks! so I finally used a model selector:

  • if the Smolvlm model is found is selected by default
  • added recommendation to get more models from Hub
  • if model does not support multimodal handle error and show a helpful message

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings October 9, 2025 13:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
// Select the first model
modelSelect.value = models[0];
// Show suggestion about recommended model
modelWarning.innerHTML = `💡 We recommend to run this demo with <code>${RECOMMENDED_MODEL}</code>`;
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using innerHTML with template literals containing variables can lead to XSS if the variable contains malicious content. Since RECOMMENDED_MODEL is a constant, this is lower risk, but consider using textContent or proper HTML escaping for consistency.

Suggested change
modelWarning.innerHTML = `💡 We recommend to run this demo with <code>${RECOMMENDED_MODEL}</code>`;
// Safely set the warning message without using innerHTML
modelWarning.textContent = ""; // Clear previous content
modelWarning.append("💡 We recommend to run this demo with ");
const codeElem = document.createElement("code");
codeElem.textContent = RECOMMENDED_MODEL;
modelWarning.appendChild(codeElem);

Copilot uses AI. Check for mistakes.
Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
Comment thread demos/multimodal/demo.html
@ilopezluna ilopezluna merged commit 6e65da4 into main Oct 9, 2025
5 checks passed
@ilopezluna ilopezluna deleted the add-multimodal-demo branch October 9, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants