Publishing and receiving audio
After you start a streaming session Palabra returns two URLs and a JWT access token:
Purpose | Field | Typical value |
---|---|---|
WebRTC (audio/control) | webrtc_url | https://<STREAMING_SERVER>.palabra.ai/livekit/ |
WebSocket (audio/control) | ws_url | wss://<STREAMING_SERVER>.palabra.ai/streaming-api/v1/speech-to-speech/stream |
Authentication | publisher | eyJhbGciOiJIUzI1NiIsInR5cCI6… |
You can publish and receive translated audio (and also control translation) through either transport:
- WebRTC - best for client applications in browsers, mobile apps, etc. Handled by LiveKit.
- WebSockets - convenient for serverside integration.
- Depending on which transport you use, you can set settings and control translation sending JSON-formatted text messages in WebRTC data channel or Websockets respectively, see translation management API documentation.
- If you choose Websockets as audio transport, the audio chunks you push must match the
format / sample_rate / channels
you declare in your set_task command.
1. Using WebRTC transport
Use any LiveKit client library to publish your audio track. Then, create a translation task using the Translation management API, and Palabra will publish a translated audio track for each target language.
- LiveKit Python SDK
- LiveKit Golang SDK
- LiveKit JS SDK
- See other SDKs here.
Publishing and Receiving code examples
Check out our Quick Start Guide for code examples on how to publish original audio (Step 4) and receive the translated audio (Step 5).
2. Using Websockets transport
Connect to Websocket by ws_url
using your publisher
access token, create a translation task using the
Translation management API, start sending and receiving audio chunks (see below).
Publishing
Send base64-encoded audio chunks to Websocket. These chunks must match the format
, sample_rate
, and channels
you declare in your set_task
command. The optimal chunk length is 320ms.
Message format example:
{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}
Receiving
Palabra will send TTS audio chunks as output_audio_data
messages over the same WebSocket connection. The chunks are
base64-encoded and default format is 24khz 16-bit mono PCM (can be changed with set_task
command).
Message format example:
{
"message_type": "output_audio_data",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es", // TTS language
"last_chunk": false, // Last generated chunk for this `transcription_id`
"data": "base64 string"
}
}
}