You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EchoTasks is a modern, intuitive to-do list application that allows you to manage your tasks entirely through voice commands. Built with a cutting-edge stack including Next.js, Deepgram for real-time transcription, and GPT-4o-mini for command analysis, EchoTasks provides a seamless and fluid user experience. Just speak, and watch your to-do list update instantly.
5
+
EchoTasks is a modern, intuitive to-do list application that allows you to manage your tasks entirely through voice commands. Built with a cutting-edge stack including Next.js, Deepgram for real-time transcription, and Groq with `qwen/qwen3-32b` for command analysis, EchoTasks provides a seamless and fluid user experience. Just speak, and watch your to-do list update instantly.
6
6
7
7
## Overview
8
8
@@ -11,27 +11,28 @@ This application demonstrates a powerful "voice-first" user interface. Instead o
11
11
## Key Features
12
12
13
13
-**Voice-First Interface**: Manage your entire to-do list using natural language commands.
14
-
-**Real-Time Transcription**: Blazing-fast and accurate speech-to-text powered by Deepgram.
15
-
-**AI-Powered Command Analysis**: GPT-4o-mini intelligently understands your intent (e.g., adding, deleting, updating) and extracts key details like task names, due dates, priorities, and more.
14
+
-**Real-Time Transcription**: Blazing-fast and accurate speech-to-text powered by Deepgram's Nova-3 model.
15
+
-**AI-Powered Command Analysis**: Groq's `qwen/qwen3-32b` model intelligently understands your intent (e.g., adding, deleting, updating) and extracts key details like task names, due dates, priorities, and more.
16
+
-**Undo Functionality**: Accidentally deleted a task? No problem. An undo button appears for 10 seconds after most actions, allowing you to revert changes with a single click.
17
+
-**Manual Task Editing**: While voice is powerful, sometimes you just need to type. A full editing dialog allows you to manually change a task's text, priority, due date, and location.
18
+
-**Safety Confirmations**: For destructive actions like deleting multiple tasks at once ("delete all high priority tasks"), the app asks for confirmation to prevent accidental data loss.
16
19
-**Client-Side Priority & Location Detection**: For instant feedback, fast local models detect priority and location keywords directly in the browser.
17
20
-**Natural Date & Time Parsing**: Understands relative dates like "tomorrow," "next Friday," and "in 2 weeks."
18
21
-**Local Persistence**: Both your tasks and your settings are saved in the browser's local storage, ensuring they are remembered every time you visit.
19
-
-**Undo Functionality**: Accidentally deleted a task? No problem. An undo button appears for 10 seconds after most actions, allowing you to revert changes.
20
22
-**Customizable Settings**:
21
23
-**Microphone Mode**: Choose between "Tap to Record" and "Hold to Record."
22
24
-**Spacebar to Talk**: Use the spacebar as a push-to-talk key for convenience.
23
25
-**Intelligent Stop**: Automatically stops recording after a few seconds of silence (in tap mode).
24
26
-**Sort Completed Tasks**: Automatically move completed tasks to the bottom of the list.
25
27
-**Temperature Unit**: Switch between Celsius and Fahrenheit for the weather display.
26
-
-**Safety Confirmations**: For destructive actions like deleting multiple tasks, the app asks for confirmation to prevent accidental data loss.
27
28
28
29
## How It Works: The Application Flow
29
30
30
31
The magic of EchoTasks lies in its sophisticated, multi-stage pipeline that turns your voice into action in under a second.
31
32
32
33
1.**Voice Capture**: The user holds the microphone button or the spacebar. The browser captures the audio using the `MediaRecorder` API.
33
-
2.**Real-Time Transcription**: The recorded audio blob is sent to a Next.js Server Action, which forwards it to the **Deepgram** API. Deepgram's Nova-2 model transcribes the audio into text with high accuracy and low latency.
34
-
3.**AI Command Analysis**: The transcribed text is then sent to another Server Action. This action calls a custom function that queries **GPT-4o-mini**. A carefully engineered system prompt instructs the AI to analyze the text and return a structured JSON object containing the user's `intent` (e.g., `ADD_TASK`, `DELETE_TASK`) and any relevant `entities` (task names, filters, updates).
34
+
2.**Real-Time Transcription**: The recorded audio blob is sent to a Next.js Server Action, which forwards it to the **Deepgram** API. Deepgram's Nova-3 model transcribes the audio into text with high accuracy and low latency.
35
+
3.**AI Command Analysis**: The transcribed text is then sent to another Server Action. This action calls a custom function that queries **Groq's `qwen/qwen3-32b` model**. A carefully engineered system prompt instructs the AI to analyze the text and return a structured JSON object containing the user's `intent` (e.g., `ADD_TASK`, `DELETE_TASK`) and any relevant `entities` (task names, filters, updates).
35
36
4.**Client-Side Heuristics (Parallel Process)**: While the AI is processing, the original transcript is also analyzed on the client-side for quick metadata detection. This includes:
36
37
***Priority Detection**: A local model scores the urgency of a new task.
37
38
***Date Parsing**: `chrono-node` parses natural language dates ("by next Friday").
@@ -52,8 +53,8 @@ For newly added tasks, the application uses a fast, client-side heuristic model
52
53
-**Frontend**: **Next.js (App Router)** & **React** for a modern, performant, and server-driven user interface.
53
54
-**UI Components**: **ShadCN/UI** and **Tailwind CSS** for a beautiful, responsive, and accessible design system.
54
55
-**State Management**: A combination of React Hooks (`useState`, `useContext`) and custom hooks for managing tasks and settings, with persistence via `localStorage`.
55
-
-**Speech-to-Text**: **Deepgram** for its exceptional speed, accuracy, and cost-effectiveness in speech recognition.
56
-
-**Natural Language Understanding**: **GPT-4o-mini (via OpenAI)** serves as the "brain," parsing user commands into structured, actionable data.
56
+
-**Speech-to-Text**: **Deepgram (Nova-3)** for its exceptional speed, accuracy, and cost-effectiveness in speech recognition.
57
+
-**Natural Language Understanding**: **Groq (`qwen/qwen3-32b`)** serves as the "brain," parsing user commands into structured, actionable data.
57
58
-**Animation**: **Framer Motion** for fluid and delightful animations on the task list.
58
59
59
60
## Performance: Latency and Accuracy
@@ -62,17 +63,17 @@ The primary goal of EchoTasks is to feel instantaneous.
62
63
63
64
-**Latency**: The entire process, from the moment you stop speaking to the UI updating, typically takes between **500ms and 1 second**. This low latency is achieved by:
64
65
- Using Deepgram's hyper-fast transcription service.
65
-
- Leveraging the speed of GPT-4o-mini for quick analysis.
66
+
- Leveraging the speed of Groq's models for quick analysis.
66
67
- Performing non-critical metadata detection (like priority) on the client side.
67
68
68
69
-**Accuracy**: The application's accuracy is a product of its layered approach:
69
-
-**Transcription Accuracy**: Deepgram's Nova-2 model provides industry-leading accuracy, with word error rates often below 5%, resulting in a **transcription accuracy of over 95%** for clear speech.
70
-
-**Intent Accuracy**: GPT-4o-mini, guided by a robust system prompt with numerous examples (few-shot prompting), demonstrates very high accuracy in identifying the correct user intent and extracting entities, achieving an **estimated intent recognition accuracy of over 98%**.
70
+
-**Transcription Accuracy**: Deepgram's Nova-3 model provides industry-leading accuracy, with word error rates often below 5%, resulting in a **transcription accuracy of over 95%** for clear speech.
71
+
-**Intent Accuracy**: The `qwen/qwen3-32b` model on Groq, guided by a robust system prompt with numerous examples (few-shot prompting), demonstrates very high accuracy in identifying the correct user intent and extracting entities, achieving an **estimated intent recognition accuracy of over 98%**.
71
72
-**Resilience**: If the AI fails to understand a command, the system gracefully informs the user without crashing, allowing them to try again.
72
73
73
74
## Getting Started
74
75
75
-
To run this project locally, you will need API keys for Deepgram and OpenAI.
76
+
To run this project locally, you will need API keys for Deepgram and Groq.
76
77
77
78
1.**Clone the repository:**
78
79
```bash
@@ -89,7 +90,7 @@ To run this project locally, you will need API keys for Deepgram and OpenAI.
89
90
Create a `.env` file in the root of the project and add your API keys:
0 commit comments