Skip to content

Commit cd18012

Browse files
committed
updated LLM and nova model
added groq and deepgram-nova-3, and space button to disable default behaviour of scrolling in page
1 parent 87faa43 commit cd18012

8 files changed

Lines changed: 364 additions & 244 deletions

File tree

README.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
![EchoTasks Preview](./public/echotasks_preview.gif)
44

5-
EchoTasks is a modern, intuitive to-do list application that allows you to manage your tasks entirely through voice commands. Built with a cutting-edge stack including Next.js, Deepgram for real-time transcription, and GPT-4o-mini for command analysis, EchoTasks provides a seamless and fluid user experience. Just speak, and watch your to-do list update instantly.
5+
EchoTasks is a modern, intuitive to-do list application that allows you to manage your tasks entirely through voice commands. Built with a cutting-edge stack including Next.js, Deepgram for real-time transcription, and Groq with `qwen/qwen3-32b` for command analysis, EchoTasks provides a seamless and fluid user experience. Just speak, and watch your to-do list update instantly.
66

77
## Overview
88

@@ -11,27 +11,28 @@ This application demonstrates a powerful "voice-first" user interface. Instead o
1111
## Key Features
1212

1313
- **Voice-First Interface**: Manage your entire to-do list using natural language commands.
14-
- **Real-Time Transcription**: Blazing-fast and accurate speech-to-text powered by Deepgram.
15-
- **AI-Powered Command Analysis**: GPT-4o-mini intelligently understands your intent (e.g., adding, deleting, updating) and extracts key details like task names, due dates, priorities, and more.
14+
- **Real-Time Transcription**: Blazing-fast and accurate speech-to-text powered by Deepgram's Nova-3 model.
15+
- **AI-Powered Command Analysis**: Groq's `qwen/qwen3-32b` model intelligently understands your intent (e.g., adding, deleting, updating) and extracts key details like task names, due dates, priorities, and more.
16+
- **Undo Functionality**: Accidentally deleted a task? No problem. An undo button appears for 10 seconds after most actions, allowing you to revert changes with a single click.
17+
- **Manual Task Editing**: While voice is powerful, sometimes you just need to type. A full editing dialog allows you to manually change a task's text, priority, due date, and location.
18+
- **Safety Confirmations**: For destructive actions like deleting multiple tasks at once ("delete all high priority tasks"), the app asks for confirmation to prevent accidental data loss.
1619
- **Client-Side Priority & Location Detection**: For instant feedback, fast local models detect priority and location keywords directly in the browser.
1720
- **Natural Date & Time Parsing**: Understands relative dates like "tomorrow," "next Friday," and "in 2 weeks."
1821
- **Local Persistence**: Both your tasks and your settings are saved in the browser's local storage, ensuring they are remembered every time you visit.
19-
- **Undo Functionality**: Accidentally deleted a task? No problem. An undo button appears for 10 seconds after most actions, allowing you to revert changes.
2022
- **Customizable Settings**:
2123
- **Microphone Mode**: Choose between "Tap to Record" and "Hold to Record."
2224
- **Spacebar to Talk**: Use the spacebar as a push-to-talk key for convenience.
2325
- **Intelligent Stop**: Automatically stops recording after a few seconds of silence (in tap mode).
2426
- **Sort Completed Tasks**: Automatically move completed tasks to the bottom of the list.
2527
- **Temperature Unit**: Switch between Celsius and Fahrenheit for the weather display.
26-
- **Safety Confirmations**: For destructive actions like deleting multiple tasks, the app asks for confirmation to prevent accidental data loss.
2728

2829
## How It Works: The Application Flow
2930

3031
The magic of EchoTasks lies in its sophisticated, multi-stage pipeline that turns your voice into action in under a second.
3132

3233
1. **Voice Capture**: The user holds the microphone button or the spacebar. The browser captures the audio using the `MediaRecorder` API.
33-
2. **Real-Time Transcription**: The recorded audio blob is sent to a Next.js Server Action, which forwards it to the **Deepgram** API. Deepgram's Nova-2 model transcribes the audio into text with high accuracy and low latency.
34-
3. **AI Command Analysis**: The transcribed text is then sent to another Server Action. This action calls a custom function that queries **GPT-4o-mini**. A carefully engineered system prompt instructs the AI to analyze the text and return a structured JSON object containing the user's `intent` (e.g., `ADD_TASK`, `DELETE_TASK`) and any relevant `entities` (task names, filters, updates).
34+
2. **Real-Time Transcription**: The recorded audio blob is sent to a Next.js Server Action, which forwards it to the **Deepgram** API. Deepgram's Nova-3 model transcribes the audio into text with high accuracy and low latency.
35+
3. **AI Command Analysis**: The transcribed text is then sent to another Server Action. This action calls a custom function that queries **Groq's `qwen/qwen3-32b` model**. A carefully engineered system prompt instructs the AI to analyze the text and return a structured JSON object containing the user's `intent` (e.g., `ADD_TASK`, `DELETE_TASK`) and any relevant `entities` (task names, filters, updates).
3536
4. **Client-Side Heuristics (Parallel Process)**: While the AI is processing, the original transcript is also analyzed on the client-side for quick metadata detection. This includes:
3637
* **Priority Detection**: A local model scores the urgency of a new task.
3738
* **Date Parsing**: `chrono-node` parses natural language dates ("by next Friday").
@@ -52,8 +53,8 @@ For newly added tasks, the application uses a fast, client-side heuristic model
5253
- **Frontend**: **Next.js (App Router)** & **React** for a modern, performant, and server-driven user interface.
5354
- **UI Components**: **ShadCN/UI** and **Tailwind CSS** for a beautiful, responsive, and accessible design system.
5455
- **State Management**: A combination of React Hooks (`useState`, `useContext`) and custom hooks for managing tasks and settings, with persistence via `localStorage`.
55-
- **Speech-to-Text**: **Deepgram** for its exceptional speed, accuracy, and cost-effectiveness in speech recognition.
56-
- **Natural Language Understanding**: **GPT-4o-mini (via OpenAI)** serves as the "brain," parsing user commands into structured, actionable data.
56+
- **Speech-to-Text**: **Deepgram (Nova-3)** for its exceptional speed, accuracy, and cost-effectiveness in speech recognition.
57+
- **Natural Language Understanding**: **Groq (`qwen/qwen3-32b`)** serves as the "brain," parsing user commands into structured, actionable data.
5758
- **Animation**: **Framer Motion** for fluid and delightful animations on the task list.
5859

5960
## Performance: Latency and Accuracy
@@ -62,17 +63,17 @@ The primary goal of EchoTasks is to feel instantaneous.
6263

6364
- **Latency**: The entire process, from the moment you stop speaking to the UI updating, typically takes between **500ms and 1 second**. This low latency is achieved by:
6465
- Using Deepgram's hyper-fast transcription service.
65-
- Leveraging the speed of GPT-4o-mini for quick analysis.
66+
- Leveraging the speed of Groq's models for quick analysis.
6667
- Performing non-critical metadata detection (like priority) on the client side.
6768

6869
- **Accuracy**: The application's accuracy is a product of its layered approach:
69-
- **Transcription Accuracy**: Deepgram's Nova-2 model provides industry-leading accuracy, with word error rates often below 5%, resulting in a **transcription accuracy of over 95%** for clear speech.
70-
- **Intent Accuracy**: GPT-4o-mini, guided by a robust system prompt with numerous examples (few-shot prompting), demonstrates very high accuracy in identifying the correct user intent and extracting entities, achieving an **estimated intent recognition accuracy of over 98%**.
70+
- **Transcription Accuracy**: Deepgram's Nova-3 model provides industry-leading accuracy, with word error rates often below 5%, resulting in a **transcription accuracy of over 95%** for clear speech.
71+
- **Intent Accuracy**: The `qwen/qwen3-32b` model on Groq, guided by a robust system prompt with numerous examples (few-shot prompting), demonstrates very high accuracy in identifying the correct user intent and extracting entities, achieving an **estimated intent recognition accuracy of over 98%**.
7172
- **Resilience**: If the AI fails to understand a command, the system gracefully informs the user without crashing, allowing them to try again.
7273

7374
## Getting Started
7475

75-
To run this project locally, you will need API keys for Deepgram and OpenAI.
76+
To run this project locally, you will need API keys for Deepgram and Groq.
7677

7778
1. **Clone the repository:**
7879
```bash
@@ -89,7 +90,7 @@ To run this project locally, you will need API keys for Deepgram and OpenAI.
8990
Create a `.env` file in the root of the project and add your API keys:
9091
```
9192
DEEPGRAM_API_KEY=your_deepgram_api_key
92-
OPENAI_API_KEY=your_openai_api_key
93+
GROQ_API_KEY=your_groq_api_key
9394
```
9495

9596
4. **Run the development server:**

package-lock.json

Lines changed: 112 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"typecheck": "tsc --noEmit"
1313
},
1414
"dependencies": {
15-
"@deepgram/sdk": "^3.3.3",
15+
"@deepgram/sdk": "^3.4.1",
1616
"@genkit-ai/google-genai": "^1.20.0",
1717
"@genkit-ai/next": "^1.20.0",
1818
"@hookform/resolvers": "^4.1.3",
@@ -40,6 +40,7 @@
4040
"class-variance-authority": "^0.7.0",
4141
"framer-motion": "^11.5.3",
4242
"genkit": "^1.20.0",
43+
"groq-sdk": "^0.5.0",
4344
"lucide-react": "^0.417.0",
4445
"react-hook-form": "^7.54.2",
4546
"recharts": "^2.15.1",

0 commit comments

Comments
 (0)