Skip to main content

Publishing and receiving audio

After you start a streaming session Palabra returns two URLs and a JWT access token:

PurposeFieldTypical value
WebRTC (audio/control)webrtc_urlhttps://<STREAMING_SERVER>.palabra.ai/livekit/
WebSocket (audio/control)ws_urlwss://<STREAMING_SERVER>.palabra.ai/streaming-api/v1/speech-to-speech/stream
AuthenticationpublishereyJhbGciOiJIUzI1NiIsInR5cCI6…

You can publish and receive translated audio (and also control translation) through either transport:

  • WebRTC - best for client applications in browsers, mobile apps, etc. Handled by LiveKit.
  • WebSockets - convenient for serverside integration.
Important
  • Depending on which transport you use, you can set settings and control translation sending JSON-formatted text messages in WebRTC data channel or Websockets respectively, see translation management API documentation.
  • If you choose Websockets as audio transport, the audio chunks you push must match the format / sample_rate / channels you declare in your set_task command.

1. Using WebRTC transport

Use any LiveKit client library to publish your audio track. Then, create a translation task using the Translation management API, and Palabra will publish a translated audio track for each target language.

Publishing and Receiving code examples

Check out our Quick Start Guide for code examples on how to publish original audio (Step 4) and receive the translated audio (Step 5).

2. Using Websockets transport

Connect to Websocket by ws_url using your publisher access token, create a translation task using the Translation management API, start sending and receiving audio chunks (see below).

Publishing

Send base64-encoded audio chunks to Websocket. These chunks must match the format, sample_rate, and channels you declare in your set_task command. The optimal chunk length is 320ms.

Message format example:

{
"message_type": "input_audio_data",
"data": {
"data": "base64 encoded data"
}
}

Receiving

Palabra will send TTS audio chunks as output_audio_data messages over the same WebSocket connection. The chunks are base64-encoded and default format is 24khz 16-bit mono PCM (can be changed with set_task command).

Message format example:

{
"message_type": "output_audio_data",
"data": {
"transcription": {
"transcription_id": "190983855fe3404e",
"language": "es", // TTS language
"last_chunk": false, // Last generated chunk for this `transcription_id`
"data": "base64 string"
}
}
}