Quick Start (WebRTC)

Best for client-side apps

The following steps explain how to use the WebRTC-based architecture, which is recommended for client-side applications. If you are looking for backend solutions, please refer to our other WebSockets Quick Start Guide.

Introduction

Palabra's API solution enables real-time speech translation through a WebRTC-based architecture using LiveKit.

The process involves creating a secure session, establishing a connection to a Palabra Translation Room, publishing your original audio stream into the Room, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then publishes the translated audio track to the same room, allowing you to subscribe to it and play it back in your application instantly.

Step 1. Get API Credentials

Visit Palabra API Keys section to obtain your Client ID and Client Secret.

Step 2. Create a Session

Use your credentials to call the POST /session-storage/session endpoint. You'll receive the webrtc_url and publisher JWT token, required for further steps.

Request Example

JavaScript
Python
cURL

const { data } = await axios.post(
 "https://api.palabra.ai/session-storage/session",
 {
   data: {
     subscriber_count: 0,
     publisher_can_subscribe: true,
   },
 },
 {
   headers: {
     ClientId: "<API_CLIENT_ID>",
     ClientSecret: "<API_CLIETN_SECRET>",
   },
 }
);

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import httpx

async def create_session(client_id: str, client_secret: str) -> dict:
    url = "https://api.palabra.ai/session-storage/session"
    headers = {"ClientId": client_id, "ClientSecret": client_secret}
    payload = {"data": {"subscriber_count": 0, "publisher_can_subscribe": True}}
    
    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=payload, headers=headers)
        response.raise_for_status()
        return response.json()

curl -X POST https://api.palabra.ai/session-storage/session \
  -H "ClientId: <API_CLIENT_ID>" \
  -H "ClientSecret: <API_CLIENT_SECRET>" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "subscriber_count": 0,
      "publisher_can_subscribe": true
    }
  }'

Response Example

{
  "publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
  "subscriber": [],
  "webrtc_room_name": "50ff0fa2",
  "webrtc_url": "https://streaming-0.palabra.ai/livekit/",
  "ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
  "id": "7f99b553-4697...7d450728"
}

webrtc_url - WebRTC (livekit) server to connect to the Translation room.

publisher - JWT Token to authenticate your connection to the WebRTC server.

Step 3. Connect to the Translation Room

Use the LiveKit SDK to join the webrtc_url with your publisher token you received on Step 2.

JavaScript
Python

npm install livekit-client

pip install livekit

JavaScript
Python

import { Room } from "livekit-client";

const connectTranslationRoom = async (WEBRTC_URL, PUBLISHER) => {
  try {
    const room = new Room();
    await room.connect(WEBRTC_URL, PUBLISHER, { autoSubscribe: true });
    return room;
  } catch (e) {
    console.error(e);
    throw e;
  }
};

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

from livekit import rtc

async def connect_translation_room(webrtc_url: str, publisher: str) -> rtc.Room:
    room = rtc.Room()
    await room.connect(webrtc_url, publisher, rtc.RoomOptions(auto_subscribe=True))
    print("💫 Connected to room")
    return room

Step 4. Publish the Original Audio Stream

Get the audio stream from your microphone and publish it to the room using LiveKit SDK.

Example

JavaScript
Python

import { LocalAudioTrack } from "livekit-client";

const publishAudioTrack = async (room) => {
    try {
        const stream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount: 1 } });
        const localTrack = new LocalAudioTrack(stream.getAudioTracks()[0]);
        await room.localParticipant.publishTrack(localTrack, {
            dtx: false,
            red: false,
            audioPreset: {
            maxBitrate: 32000,
            priority: "high"
        }
    });
    } catch (e) {
        console.error("Error while publishing audio track:", e);
        throw e;
    }
}

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import asyncio
import queue
import threading
import numpy as np
import sounddevice as sd
from livekit import rtc

async def publish_audio_track(room: rtc.Room) -> rtc.AudioSource:
    # Create audio source
    audio_source = rtc.AudioSource(sample_rate=48000, num_channels=1)
    
    # Create and publish track
    track = rtc.LocalAudioTrack.create_audio_track("microphone", audio_source)
    await room.local_participant.publish_track(track, rtc.TrackPublishOptions(dtx=False, red=False))
    print("🗣️ Microphone published")
    
    # Start capturing in background
    asyncio.create_task(capture_microphone(audio_source))
    return audio_source

async def capture_microphone(audio_source: rtc.AudioSource):
    sample_rate = 48000
    frame = rtc.AudioFrame.create(sample_rate, 1, 480)
    audio_queue = queue.Queue(maxsize=100)
    stop_event = threading.Event()

    def input_callback(indata, frames, time_info, status):
        try:
            audio_queue.put_nowait(np.frombuffer(indata, dtype=np.int16).copy())
        except queue.Full:
            pass

    def recording_thread():
        with sd.RawInputStream(
                samplerate=sample_rate,
                channels=1,
                dtype='int16',
                callback=input_callback,
                blocksize=480
        ):
            while not stop_event.is_set():
                time.sleep(0.01)

    threading.Thread(target=recording_thread, daemon=True).start()
    print("🎤 Mic started")

    buffer = np.array([], dtype=np.int16)
    while True:
        try:
            audio_data = audio_queue.get(timeout=0.1)
            buffer = np.concatenate([buffer, audio_data])

            while len(buffer) >= 480:
                chunk = buffer[:480]
                buffer = buffer[480:]
                np.copyto(np.frombuffer(frame.data, dtype=np.int16), chunk)
                await audio_source.capture_frame(frame)

        except queue.Empty:
            await asyncio.sleep(0.001)

Step 5. Handle Translated Audio Track

As soon as the translated audio track is published in the room, you will be auto-subscribed to it. You can handle it within a callback and play it through the speakers.

Example

JavaScript
Python

import { RoomEvent } from "livekit-client";

const playTranslationInBrowser = (track) => {
    if (track.kind === "audio") {
        const mediaStream = new MediaStream([track.mediaStreamTrack]);
        const audioElement = document.getElementById(
            "remote-audio"
        ); // Your HTML audio element

        if (audioElement) {
            audioElement.srcObject = mediaStream;
            audioElement.play();
        } else {
            console.error("Audio element not found!");
        }
    }
};

// Add a handler for a TrackSubscribed event
room.on(RoomEvent.TrackSubscribed, playTranslationInBrowser);

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import asyncio
import queue
import threading
import numpy as np
import sounddevice as sd
from livekit import rtc

def on_track_subscribed(track, publication, participant):
    if track.kind == rtc.TrackKind.KIND_AUDIO and "translation_" in publication.name:
        lang = publication.name.split("translation_")[-1]
        play_track(track, lang)

def play_track(track: rtc.Track, lang: str):
    def run_playback():
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)

        async def play():
            audio_stream = rtc.AudioStream(track, sample_rate=48000, num_channels=1)
            audio_queue = queue.Queue(maxsize=100)

            def audio_callback(outdata, frames, time_info, status):
                try:
                    data = audio_queue.get_nowait()
                    outdata[:] = data.reshape(-1, 1)
                except queue.Empty:
                    outdata.fill(0)

            output_stream = sd.OutputStream(
                samplerate=48000,
                channels=1,
                dtype='int16',
                callback=audio_callback,
                blocksize=480
            )
            output_stream.start()
            print(f"🔊 Playing: {lang}")

            async for event in audio_stream:
                frame_data = np.frombuffer(event.frame.data, dtype=np.int16)
                try:
                    audio_queue.put_nowait(frame_data)
                except queue.Full:
                    pass

        loop.run_until_complete(play())

    threading.Thread(target=run_playback, daemon=True).start()

Step 6. Start the Translation

Use the webRTC connection's data channel to publish a set_task message with the translation settings to start the translation process.

Example

JavaScript
Python

const startTranslation = (room, translationSettings) => {
  const payload = JSON.stringify({
    message_type: "set_task",
    data: translationSettings
  });
  const encoder = new TextEncoder();
  const message = encoder.encode(payload);

  // Send the set_task message through the data channel
  room.localParticipant.publishData(message, { reliable: true });
};

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import json
from livekit import rtc

async def start_translation(room: rtc.Room, translation_settings: dict):
    # Create the set_task message
    payload = {
        "message_type": "set_task",
        "data": translation_settings
    }
    
    # Send through data channel
    message_bytes = json.dumps(payload).encode("utf-8")
    await room.local_participant.publish_data(message_bytes, True)
    
    langs = [t["target_language"] for t in translation_settings["pipeline"]["translations"]]
    print(f"⚙️ Settings sent: {langs}")

Summary

As soon as you send the set_task message, Palabra will take your published original audio track, translate it into the target language specified in the settings, and publish the translated track to the same room. The LiveKit SDK will auto-subscribe you to this translated audio stream, making it available for real-time playback through the speakers.

Good to know

Read more about pausing and stopping the translation on Translation management section.
Unused sessions remain active for at least 1 minute. To avoid reaching the limit of simultaneously active sessions, it's best practice to delete unused sessions when you stop translation or when the page is unmounted. Learn more about Sessions Lifcycle.
Due to browser security restrictions, audio cannot be played until the user has interacted with the page. Therefore, do not start the entire pipeline automatically when the page loads. Instead, wait for the user to perform an action (like pressing a 'Start' button) before activating audio playback and related processes.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].

Introduction​

Step 1. Get API Credentials​

Step 2. Create a Session​

Request Example​

Response Example​

Step 3. Connect to the Translation Room​

Step 4. Publish the Original Audio Stream​

Step 5. Handle Translated Audio Track​

Step 6. Start the Translation​

Summary​

Good to know​

Need help?​

Introduction

Step 1. Get API Credentials

Step 2. Create a Session

Request Example

Response Example

Step 3. Connect to the Translation Room

Step 4. Publish the Original Audio Stream

Step 5. Handle Translated Audio Track

Step 6. Start the Translation

Summary

Good to know

Need help?