Translation management API

Control Palabra real-time translation pipeline over WebSockets or a WebRTC data channel.

1. Prerequisites

Before you connect, make sure you have the following:

webrtc_url – The URL of the Palabra WebRTC server.
ws_url – The URL of the Palabra WebSocket server.
publisher access token – A JWT used to authorize your connection.

All three values are returned when you create a streaming session.

2. Choose a transport

2.1 Option 1. WebRTC

Connect with any LiveKit client to webrtc_url using your publisher token.
Once the connection is open, you can start sending commands through the default (empty-topic) WebRTC data channel.

2.2 Option 2. WebSockets

Connect to ws_url, passing your publisher token as a query parameter:

// WebSocket control URL
const endpoint = `${ws_url}?token=${publisher}`;
const socket = new WebSocket(endpoint);

Once the connection is open, you can start sending commands.

3. Message format (WebRTC & WebSockets)

Every API packet—request and response has the same envelope:

{
  "message_type": "<string>",
  "data": { /* payload */ }
}

If message_type is "error", the data field contains diagnostic information.

4. Typical workflow

Create task - send a set_task message type to start the translation.
Update task – Send another set_task message to update the translation settings during an ongoing translation.
Pause processing - send a pause_task message type to pause the translation (stops billing). Resume with another set_task.
Finish task - send an end_task (the server will close your connection automatically; session will be invalidated in 1 minute).

5. Message Settings reference

What each field means - see the translation settings breakdown.
Recommended values - see the best-practice settings.

6. Streaming audio configuration

6.1 Option 1. WebRTC audio I/O configuration

Use the following input/output streams configuration in your set_task:

{
  "data": {
    "input_stream": {
      "content_type": "ws",
      "source": {
        "type": "webrtc"
      }
    },
    "output_stream": {
      "content_type": "ws",
      "target": {
        "type": "webrtc"
      }
    }
    // ...
  }
}

Publish your microphone track to LiveKit Room.
Subscribe to the translation tracks that Palabra will publish in the same LiveKit Room after you send the set_task message.

6.2 Option 2. WebSocket audio I/O configuration

Use this configuration instead:

{
  "data": {
    "input_stream": {
      "content_type": "ws",
      "source": {
        "type": "ws",
        "format": "opus",     // or pcm_s16le, wav
        "sample_rate": 24000, // 16000 - 24000
        "channels": 1         // 1 or 2
      }
    },
    "output_stream": {
      "content_type": "ws",
      "target": {
        "type": "ws",
        "format": "pcm_s16le" // or zlib_pcm_s16le
      }
    }
  }
}

Send base-64 audio chunks that exactly match the declared format.
Receive base-64 TTS chunks in output_audio_data responses:

{
  "message_type": "output_audio_data",
  "data": {
    "transcription_id": "190983855fe3404e",
    "language":        "es",
    "last_chunk":      false,
    "data":            "<base64-encoded audio>"
  }
}

7. API messages

Request message schema

Loading ....

Response message schema

Loading ....

7.1 Requests (client → server)

Message type	Short description
`set_task`	Create/update translation task
`end task`	Finish translation task
`get_task`	Return current task
`pause_task`	Pause current task, use `set_task` to continue
`tts_task`	Generate TTS from text
`input_audio_data`	Input audio data chunk (Websockets audio transport only)

`set_task`

Create a new task or modify the current one.

Sending for the first time after creating a session - starts the translation.
Sending after pause_task - resumes the translation.
Sending another set_task message during an ongoing translation updates the current translation settings in real time—no need to stop the translation.

{
  "message_type": "set_task",
  "data": {
    "input_stream":  { /* Depending on transport, see the audio I/O section above */ },
    "output_stream": { /* Depending on transport, see the audio I/O section above */ },
    "pipeline": {
      "transcription": {
        "source_language": "string",
        "detectable_languages": ["string"],
        "segment_confirmation_silence_threshold": "float",
        "only_confirm_by_silence": "bool",
        "sentence_splitter": {
          "enabled": "bool"
         },
        "verification": {
          "auto_transcription_correction": "bool",
          "transcription_correction_style": "string"
        }
      },
      // Translation and speech generation settings for one or more target languages
      "translations": [
        {
          "target_language": "string",
          "translate_partial_transcriptions": "bool",
          "speech_generation": {
            "voice_cloning": "bool",
            "voice_id": "string",
            "voice_timbre_detection": {
              "enabled": "bool",
              "high_timbre_voices": [],
              "low_timbre_voices": []
            }
          }
        }
        // You can add more targets
      ],
      "translation_queue_configs": {
        "global": {
          "desired_queue_level_ms": "int",
          "max_queue_level_ms": "int",
          "auto_tempo": "bool"
        },
        "es": {
          "desired_queue_level_ms": "int",
          "max_queue_level_ms": "int"
        }
      },
      // Select response types to receive
      "allowed_message_types": [
        "translated_transcription",
        "partial_transcription",
        "partial_translated_transcription",
        "validated_transcription"
      ]
    }
  }
}

Check out the Translation settings breakdown for details on each field and Recommended Settings of the translation's task pipeline..

Need ASR only mode? Do either:

set output_stream to null (you will still get text translations, but no TTS audio)

use an empty translations list (no translations and no TTS).

`end_task`

Finish the current task. The server closes the connection after receiving end_task.

{
  "message_type": "end_task",
  "data": { "force": false } // set true to skip finalization of the last phrase
}

`pause_task`

Pause the current task. No audio data is processed and no billing while the task is paused. Use set_task to resume translation.

{ "message_type": "pause_task", "data": {} }

`get_task`

Return the current task.

{ "message_type": "pause_task", "data": {} }

`tts_task`

Generates TTS from a text. It will be translated to all target_language in translations task section.

{
  "message_type": "tts_task",
  "data": {
    "text": "Hello, how are you?",
    "language": "en" // text language
  }
}

`input_audio_data`

Used to send input base64 encoded audio data chunk whe Websockets selected as audio transport. The audio chunks you push must match the format / sample_rate / channels you declare in your set_task command. The optimal chunk length is 320ms.

{
  "message_type": "input_audio_data",
  "data": {
    "data": "base64 encoded data"
  }
}

7.2 Responses (server → client)

Message type	Short description
`partial_transcription`	Unconfirmed ASR segment
`partial_translated_transcription`	Unconfirmed translation segment
`validated_transcription`	Final ASR segment
`translated_transcription`	Final translation
`output_audio_data`	Chunk of generated TTS audio (WebSockets audio transport only)
`current_task`	`get_task` command response
`error`	Validation or runtime error

To receive partial_transcription, validated_transcription, and translated_transcription messages, you must include these message types in the allowed_message_types field of your set_task command.

To receive partial_translated_transcription messages, you must must include it in the allowed_message_types field AND set translate_partial_transcriptions to true in your set_task command.

`partial_transcription`

Uncompleted segment transcription:

{
  "message_type": "partial_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "en",
      "text": "One, two"
    }
  }
}

`partial_translated_transcription`

Uncompleted segment translation.

{
  "message_type": "translated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es",
      "text": "Um, dois,"
    }
  }
}

`validated_transcription`

Completed segment transcription.

{
  "message_type": "validated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "en",
      "text": "One, two, three, four, five."
    }
  }
}

`translated_transcription`

Completed segment translation:

{
  "message_type": "translated_transcription",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es",
      "text": "Um, dois, três, quatro, cinco."
    }
  }
}

`output_audio_data`

TTS audio chunk (if you use Websockets as audio transport).

{
  "message_type": "output_audio_data",
  "data": {
    "transcription": {
      "transcription_id": "190983855fe3404e",
      "language": "es", // TTS language
      "last_chunk": false, // Last generated chunk for this `transcription_id`
      "data": "base64 string"
    }
  }
}

`current_task`

The get_task command response.

{
  "message_type": "error",
  "data": {
    "code": "VALIDATION_ERROR",
    "desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
    "msg": "value is not a valid enumeration member; permitted: 'audio'\", 'type': 'type_error.enum'",
    "param": null
  }
}

`error`:

Validation, authorization or other kinds of errors.

{
  "message_type": "error",
  "data": {
    "code": "VALIDATION_ERROR",
    "desc": "ValidationError(model='SetTaskMessage', errors=[{'loc': ('input_stream', 'content_type')",
    "param": null
  }
}

1. Prerequisites​

2. Choose a transport​

2.1 Option 1. WebRTC​

2.2 Option 2. WebSockets​

3. Message format (WebRTC & WebSockets)​

4. Typical workflow​

5. Message Settings reference​

6. Streaming audio configuration​

6.1 Option 1. WebRTC audio I/O configuration​

6.2 Option 2. WebSocket audio I/O configuration​

7. API messages​

Request message schema​

Response message schema​

7.1 Requests (client → server)​

set_task​

Need ASR only mode? Do either:​

end_task​

pause_task​

get_task​

tts_task​

input_audio_data​

7.2 Responses (server → client)​

partial_transcription​

partial_translated_transcription​

validated_transcription​

translated_transcription​

output_audio_data​

current_task​

error:​