Quick Start (WebRTC)
The following steps explain how to use the WebRTC-based architecture, which is recommended for client-side applications. If you are looking for backend solutions, please refer to our other WebSockets Quick Start Guide.
Introduction
Palabra's API solution enables real-time speech translation through a WebRTC-based architecture using LiveKit.
The process involves creating a secure session, establishing a connection to a Palabra Translation Room, publishing your original audio stream into the Room, and configuring the translation pipeline with your desired language settings.
Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then publishes the translated audio track to the same room, allowing you to subscribe to it and play it back in your application instantly.
Step 1. Get API Credentials
Visit Palabra API Keys section to obtain your Client ID
and Client Secret
.
Step 2. Create a Session
Use your credentials to call the POST
/session-storage/session
endpoint. You'll receive the webrtc_url
and publisher
JWT token, required for further steps.
Request Example
- JavaScript
- Python
- cURL
const { data } = await axios.post(
"https://api.palabra.ai/session-storage/session",
{
data: {
subscriber_count: 0,
publisher_can_subscribe: true,
},
},
{
headers: {
ClientId: "<API_CLIENT_ID>",
ClientSecret: "<API_CLIETN_SECRET>",
},
}
);
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import httpx
async def create_session(client_id: str, client_secret: str) -> dict:
url = "https://api.palabra.ai/session-storage/session"
headers = {"ClientId": client_id, "ClientSecret": client_secret}
payload = {"data": {"subscriber_count": 0, "publisher_can_subscribe": True}}
async with httpx.AsyncClient() as client:
response = await client.post(url, json=payload, headers=headers)
response.raise_for_status()
return response.json()
curl -X POST https://api.palabra.ai/session-storage/session \
-H "ClientId: <API_CLIENT_ID>" \
-H "ClientSecret: <API_CLIENT_SECRET>" \
-H "Content-Type: application/json" \
-d '{
"data": {
"subscriber_count": 0,
"publisher_can_subscribe": true
}
}'
Response Example
{
"publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
"subscriber": [],
"webrtc_room_name": "50ff0fa2",
"webrtc_url": "https://streaming-0.palabra.ai/livekit/",
"ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
"id": "7f99b553-4697...7d450728"
}
webrtc_url
- WebRTC (livekit) server to connect to the Translation room.
publisher
- JWT Token to authenticate your connection to the WebRTC server.
Step 3. Connect to the Translation Room
Use the LiveKit SDK to join the webrtc_url
with your publisher
token you received on Step 2.
- JavaScript
- Python
npm install livekit-client
pip install livekit
- JavaScript
- Python
import { Room } from "livekit-client";
const connectTranslationRoom = async (WEBRTC_URL, PUBLISHER) => {
try {
const room = new Room();
await room.connect(WEBRTC_URL, PUBLISHER, { autoSubscribe: true });
return room;
} catch (e) {
console.error(e);
throw e;
}
};
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
from livekit import rtc
async def connect_translation_room(webrtc_url: str, publisher: str) -> rtc.Room:
room = rtc.Room()
await room.connect(webrtc_url, publisher, rtc.RoomOptions(auto_subscribe=True))
print("💫 Connected to room")
return room
Step 4. Publish the Original Audio Stream
Get the audio stream from your microphone and publish it to the room
using LiveKit SDK.
Example
- JavaScript
- Python
import { LocalAudioTrack } from "livekit-client";
const publishAudioTrack = async (room) => {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount: 1 } });
const localTrack = new LocalAudioTrack(stream.getAudioTracks()[0]);
await room.localParticipant.publishTrack(localTrack, {
dtx: false,
red: false,
audioPreset: {
maxBitrate: 32000,
priority: "high"
}
});
} catch (e) {
console.error("Error while publishing audio track:", e);
throw e;
}
}
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import asyncio
import queue
import threading
import numpy as np
import sounddevice as sd
from livekit import rtc
async def publish_audio_track(room: rtc.Room) -> rtc.AudioSource:
# Create audio source
audio_source = rtc.AudioSource(sample_rate=48000, num_channels=1)
# Create and publish track
track = rtc.LocalAudioTrack.create_audio_track("microphone", audio_source)
await room.local_participant.publish_track(track, rtc.TrackPublishOptions(dtx=False, red=False))
print("🗣️ Microphone published")
# Start capturing in background
asyncio.create_task(capture_microphone(audio_source))
return audio_source
async def capture_microphone(audio_source: rtc.AudioSource):
sample_rate = 48000
frame = rtc.AudioFrame.create(sample_rate, 1, 480)
audio_queue = queue.Queue(maxsize=100)
stop_event = threading.Event()
def input_callback(indata, frames, time_info, status):
try:
audio_queue.put_nowait(np.frombuffer(indata, dtype=np.int16).copy())
except queue.Full:
pass
def recording_thread():
with sd.RawInputStream(
samplerate=sample_rate,
channels=1,
dtype='int16',
callback=input_callback,
blocksize=480
):
while not stop_event.is_set():
time.sleep(0.01)
threading.Thread(target=recording_thread, daemon=True).start()
print("🎤 Mic started")
buffer = np.array([], dtype=np.int16)
while True:
try:
audio_data = audio_queue.get(timeout=0.1)
buffer = np.concatenate([buffer, audio_data])
while len(buffer) >= 480:
chunk = buffer[:480]
buffer = buffer[480:]
np.copyto(np.frombuffer(frame.data, dtype=np.int16), chunk)
await audio_source.capture_frame(frame)
except queue.Empty:
await asyncio.sleep(0.001)
Step 5. Handle Translated Audio Track
As soon as the translated audio track is published in the room
, you will be auto-subscribed to it. You can handle it within a callback and play it through the speakers.
Example
- JavaScript
- Python
import { RoomEvent } from "livekit-client";
const playTranslationInBrowser = (track) => {
if (track.kind === "audio") {
const mediaStream = new MediaStream([track.mediaStreamTrack]);
const audioElement = document.getElementById(
"remote-audio"
); // Your HTML audio element
if (audioElement) {
audioElement.srcObject = mediaStream;
audioElement.play();
} else {
console.error("Audio element not found!");
}
}
};
// Add a handler for a TrackSubscribed event
room.on(RoomEvent.TrackSubscribed, playTranslationInBrowser);
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import asyncio
import queue
import threading
import numpy as np
import sounddevice as sd
from livekit import rtc
def on_track_subscribed(track, publication, participant):
if track.kind == rtc.TrackKind.KIND_AUDIO and "translation_" in publication.name:
lang = publication.name.split("translation_")[-1]
play_track(track, lang)
def play_track(track: rtc.Track, lang: str):
def run_playback():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
async def play():
audio_stream = rtc.AudioStream(track, sample_rate=48000, num_channels=1)
audio_queue = queue.Queue(maxsize=100)
def audio_callback(outdata, frames, time_info, status):
try:
data = audio_queue.get_nowait()
outdata[:] = data.reshape(-1, 1)
except queue.Empty:
outdata.fill(0)
output_stream = sd.OutputStream(
samplerate=48000,
channels=1,
dtype='int16',
callback=audio_callback,
blocksize=480
)
output_stream.start()
print(f"🔊 Playing: {lang}")
async for event in audio_stream:
frame_data = np.frombuffer(event.frame.data, dtype=np.int16)
try:
audio_queue.put_nowait(frame_data)
except queue.Full:
pass
loop.run_until_complete(play())
threading.Thread(target=run_playback, daemon=True).start()
Step 6. Start the Translation
Use the webRTC connection's data channel to publish a set_task
message with the translation settings to start the translation process.
Example
- JavaScript
- Python
const startTranslation = (room, translationSettings) => {
const payload = JSON.stringify({
message_type: "set_task",
data: translationSettings
});
const encoder = new TextEncoder();
const message = encoder.encode(payload);
// Send the set_task message through the data channel
room.localParticipant.publishData(message, { reliable: true });
};
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import json
from livekit import rtc
async def start_translation(room: rtc.Room, translation_settings: dict):
# Create the set_task message
payload = {
"message_type": "set_task",
"data": translation_settings
}
# Send through data channel
message_bytes = json.dumps(payload).encode("utf-8")
await room.local_participant.publish_data(message_bytes, True)
langs = [t["target_language"] for t in translation_settings["pipeline"]["translations"]]
print(f"⚙️ Settings sent: {langs}")
Summary
As soon as you send the set_task
message, Palabra will take your published original audio track, translate it into the target language specified in the settings, and publish the translated track to the same room. The LiveKit SDK will auto-subscribe you to this translated audio stream, making it available for real-time playback through the speakers.
Good to know
- Read more about pausing and stopping the translation on Translation management section.
- Unused sessions remain active for at least 1 minute. To avoid reaching the limit of simultaneously active sessions, it's best practice to delete unused sessions when you stop translation or when the page is unmounted. Learn more about Sessions Lifcycle.
- Due to browser security restrictions, audio cannot be played until the user has interacted with the page. Therefore, do not start the entire pipeline automatically when the page loads. Instead, wait for the user to perform an action (like pressing a 'Start' button) before activating audio playback and related processes.
Need help?
Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].