Resources
Everything you need to build amazing speech-enabled applications with SpeechCortex. Choose between real-time streaming or batch processing for post-call analysis.
Voice AI Engine
COVE - Conversational Voice Engine for next-generation Voice AI applications
Feature Overview
COVE (Conversational Voice Engine) is purpose-built for Voice AI applications, providing intelligent turn detection and seamless integration with voice assistants and conversational AI systems.
Intelligent Turn Detection
Context-aware end-of-turn detection
Ultra-Low Latency
Sub-200ms response times
Integrated Streaming STT
Built-in speech-to-text pipeline
COVE Engine
The Conversational Voice Engine (COVE) is the core of our Voice AI stack, providing intelligent speech processing optimized for conversational applications.
🎯 Purpose-Built for Voice AI
Unlike general-purpose STT, COVE is specifically designed for interactive voice applications where natural conversation flow is critical.
🔄 Full Duplex Communication
Supports simultaneous listening and speaking, enabling natural conversational interactions without artificial turn-taking constraints.
🧠 Context-Aware Processing
Leverages conversation context to improve accuracy and make intelligent decisions about turn boundaries.
End of Turn Detection
Intelligent detection of when a speaker has finished their turn, enabling natural conversation flow.
⏱️ Adaptive Timing
Dynamically adjusts silence thresholds based on conversation context, avoiding premature cutoffs during natural pauses.
📝 Linguistic Analysis
Analyzes sentence structure and semantics to determine completion, not just silence detection.
🎭 Prosodic Features
Uses pitch, intonation, and rhythm patterns to identify natural turn boundaries with high accuracy.
Start of Turn Detection
Rapid detection of when a user begins speaking, enabling responsive Voice AI interactions.
⚡ Instant Detection
Sub-100ms detection of speech onset, enabling immediate system response and natural conversation pacing.
🔇 Noise Filtering
Distinguishes actual speech from background noise, coughs, and non-speech sounds to prevent false triggers.
🎤 Barge-In Support
Enables users to interrupt system speech naturally, just like in human conversations.
Getting Started
Integrate COVE into your Voice AI application in minutes.
Connect via WebSocket
Establish a connection to the COVE endpoint.
wss://api.speechcortex.ai/v1/cove?api_key=YOUR_API_KEYConfigure COVE Settings
Enable turn detection features.
{ "type": "config", "end_of_turn": true, "start_of_turn": true }Stream Audio & Receive Events
Send audio and receive transcription + turn events.
WebSocket API
Full reference for the COVE WebSocket API.
WSS /v1/covetranscript.partialInterim transcriptiontranscript.finalFinal transcriptionturn.startUser started speakingturn.endUser finished speakingSDKs & Libraries
Official SDKs with COVE integration for Voice AI applications.
Python
v2.1.0
JavaScript
v1.8.0
React Native
v1.2.0
Swift (iOS)
v1.4.0
Kotlin (Android)
v1.3.0
Go
v1.5.0
Code Samples
import speechcortex
client = speechcortex.Client(api_key="YOUR_API_KEY")
def on_turn_end(event):
print(f"User finished: {event.transcript}")
# Trigger your AI response here
def on_turn_start(event):
print("User started speaking")
# Stop any ongoing TTS playback
# Start COVE session
session = client.cove.start(
on_turn_start=on_turn_start,
on_turn_end=on_turn_end
)
session.stream_microphone()import { SpeechCortex } from '@speechcortex/sdk';
const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });
const session = await client.cove.start({
onTurnStart: () => {
// User interrupted - stop AI speech
ttsPlayer.stop();
},
onTurnEnd: (event) => {
// User finished - process with your LLM
const response = await llm.generate(event.transcript);
ttsPlayer.speak(response);
}
});
await session.startMicrophone();Use Cases
Voice AI Agents
Build natural conversational AI assistants with intelligent turn-taking and barge-in support.
Contact Center AI
Deploy AI agents that can handle customer calls with human-like conversation flow.
Voice-First Applications
Create hands-free applications with responsive voice interaction for IoT and automotive.
Interactive IVR
Replace rigid menu systems with natural language voice interfaces.
Streaming - STT
Real-time streaming speech recognition for live applications
Feature Overview
Stream audio in real-time and receive transcriptions with ultra-low latency. Perfect for voice assistants, live captioning, and interactive applications.
~200ms Latency
Near real-time transcription
WebSocket API
Persistent bi-directional connection
Interim Results
Get partial transcripts as you speak
Media Settings
Configure audio input parameters for optimal streaming transcription quality.
Sample Rate
Supported sample rates for audio input.
Audio Encoding
Supported audio encoding formats.
Channels
Mono (1 channel) or Stereo (2 channels) audio input supported.
Results
Understanding the transcription results returned by the streaming API.
⏳ Interim & Final
Receive both interim (partial) and final transcription results. Interim results update in real-time as speech is processed, while final results are confirmed and won't change.
🎙️ Speech Final
Indicates when a complete utterance has been recognized. Triggered when natural speech boundaries are detected, such as pauses or sentence endings.
📂 Finalise
Force finalization of the current transcript segment. Useful for ending a session or when you need immediate final results without waiting for natural speech boundaries.
🕒 Word Timing
Precise start and end timestamps for each word in the transcript. Enables accurate audio-text alignment for captions, highlights, and playback synchronization.
💡 Word Confidence
Individual confidence scores (0.0 to 1.0) for each recognized word. Helps identify uncertain transcriptions and enables quality-based filtering or highlighting.
Controls
Control messages to manage the streaming session.
🧩 Variable Chunk
Send audio in variable-sized chunks based on your application's needs. Supports flexible chunk sizes for optimal latency and throughput balance.
♻️ Keep Alive
Maintain the WebSocket connection during periods of silence or inactivity. Prevents timeout disconnections and ensures seamless resumption of audio streaming.
🔚 End Pointing
Automatic detection of speech endpoints to determine when a speaker has finished talking. Enables natural conversation flow and timely transcript finalization.
Format
Output formatting options for transcription results.
✒️ Punctuations
Automatically add punctuation marks (periods, commas, question marks) to the transcript for improved readability and natural text flow.
🗯️ Filler Words
Control whether filler words (um, uh, like, you know) are included or filtered out from the transcript. Useful for verbatim transcription or cleaner output.
Getting Started
Set up real-time streaming transcription in minutes.
Establish WebSocket Connection
Connect to our streaming endpoint with your API key.
wss://api.speechcortex.ai/v1/stream?api_key=YOUR_API_KEYConfigure Audio Settings
Send configuration message with sample rate and encoding.
{ "type": "config", "sample_rate": 16000, "encoding": "pcm_s16le" }Stream Audio Data
Send binary audio chunks and receive transcription events.
WebSocket API
Full reference for the streaming WebSocket API.
WSS /v1/streamtranscript.partialInterim transcription resultstranscript.finalFinal transcription segmentvad.speech_endEnd of speech detectedSDKs & Libraries
Official SDKs with built-in WebSocket handling and audio capture.
Python
v2.1.0
JavaScript
v1.8.0
React Native
v1.2.0
Swift (iOS)
v1.4.0
Kotlin (Android)
v1.3.0
Go
v1.5.0
Use Cases
Voice Assistants
Build conversational AI with instant speech recognition and natural turn-taking.
Live Captioning
Real-time captions for video calls, broadcasts, and live events.
Voice Commands
Enable hands-free control in apps, games, and IoT devices.
Live Agent Assist
Provide real-time suggestions to customer service agents during calls.
Code Samples
import { SpeechCortex } from '@speechcortex/sdk';
const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });
// Start streaming from microphone
const stream = await client.startStreaming({
onPartial: (text) => console.log('Partial:', text),
onFinal: (text) => console.log('Final:', text),
onError: (err) => console.error(err)
});
// Later: stop streaming
stream.stop();import speechcortex
client = speechcortex.Client(api_key="YOUR_API_KEY")
def on_transcript(event):
if event.is_final:
print(f"Final: {event.text}")
else:
print(f"Partial: {event.text}")
# Stream from microphone
client.stream_microphone(on_transcript=on_transcript)Batch - STT
Batch speech-to-text processing for recorded audio files
Feature Overview
Process recorded audio files with high accuracy. Ideal for call center analytics, meeting transcription, and content archival with speaker diarization and advanced features.
High Accuracy
<5% Word Error Rate
Speaker Diarization
Identify who said what
Batch Processing
Process thousands of files
Media Settings
Supported audio formats and configuration options for batch transcription.
Supported File Formats
Upload audio in any of these formats.
File Size Limits
Maximum file sizes for different tiers.
Standard
500 MB
Enterprise
2 GB
Audio Duration
Maximum audio duration of 4 hours per file. Longer recordings can be split automatically.
Results
Understanding the transcription output from batch processing.
Full Transcript
Complete text transcription with punctuation and formatting.
Word-Level Timestamps
Precise start and end times for each word in the transcript.
Speaker Labels
When diarization is enabled, each segment is labeled with the speaker identifier.
Confidence Scores
Overall and word-level confidence scores for quality assessment.
Controls
Options to control transcription behavior and output.
Speaker Diarization
diarization: trueIdentify and separate different speakers in the audio.
Punctuation
punctuation: trueAutomatically add punctuation to the transcript.
Profanity Filter
profanity_filter: trueMask profane words in the transcript output.
Custom Vocabulary
custom_vocabulary: [...]Boost recognition of domain-specific terms and names.
Format
API request and response formats for batch transcription.
{
"audio_url": "https://storage.example.com/recording.wav",
"language": "en-US",
"diarization": true,
"punctuation": true,
"webhook_url": "https://your-app.com/webhook"
}{
"id": "tx_abc123",
"status": "completed",
"text": "Hello, how can I help you today?",
"confidence": 0.96,
"duration": 45.2,
"segments": [
{
"speaker": "Speaker 1",
"text": "Hello, how can I help you today?",
"start": 0.0,
"end": 2.5
}
]
}Getting Started
Transcribe your first audio file in minutes.
Upload Audio File
Send your audio file to the transcription endpoint.
curl -X POST https://api.speechcortex.ai/v1/transcribe \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "audio=@call_recording.wav"Get Transcription Results
Receive structured JSON with text, timestamps, and speaker labels.
REST API
Simple REST endpoints for audio transcription.
POST /v1/transcribeGET /v1/transcriptions/:idPOST /v1/diarizePOST /v1/detect-languageBatch Processing
Process large volumes of audio files efficiently with our batch API.
How Batch Processing Works
- Submit a batch job with multiple audio file URLs
- Our system queues and processes files in parallel
- Receive webhook notifications as transcriptions complete
- Retrieve all results via the batch status endpoint
curl -X POST https://api.speechcortex.ai/v1/batch -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{
"files": [
"https://storage.example.com/call1.wav",
"https://storage.example.com/call2.wav"
],
"webhook_url": "https://your-app.com/webhook",
"options": {
"diarization": true,
"punctuation": true
}
}'Use Cases
Contact Center Analytics
Analyze call recordings to improve agent performance and customer satisfaction.
Meeting Transcription
Convert recorded meetings into searchable, shareable transcripts.
Compliance & Quality
Ensure regulatory compliance with complete call documentation.
Content Archival
Make audio and video archives searchable and accessible.
Code Samples
import speechcortex
client = speechcortex.Client(api_key="YOUR_API_KEY")
# Transcribe with speaker diarization
result = client.transcribe(
"call_recording.wav",
diarization=True,
punctuation=True
)
for segment in result.segments:
print(f"Speaker {segment.speaker}: {segment.text}")
# Output:
# Speaker 1: Hello, how can I help you today?
# Speaker 2: I'd like to check my account balance.import { SpeechCortex } from '@speechcortex/sdk';
const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });
// Submit for async processing
const job = await client.transcribeAsync({
audioUrl: 'https://storage.example.com/recording.wav',
webhookUrl: 'https://your-app.com/webhook'
});
console.log('Job ID:', job.id);
// Results delivered via webhook when readyReady to Get Started?
Sign up for free and start building with SpeechCortex today. No credit card required.
