Quick Start (WebSockets)
The following steps explain how to use the WebSocket-based architecture, which is recommended for server-side applications and backend integrations. If you are looking for client-side solutions, please refer to our WebRTC Quick Start Guide.
Palabra's API solution enables real-time speech translation through a WebSocket-based architecture.
The process involves creating a secure session, establishing a WebSocket connection to Palabra's streaming API, sending your audio data through the WebSocket connection, and configuring the translation pipeline with your desired language settings.
Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then streams the translated audio back through the same WebSocket connection, allowing you to receive and process it in your application instantly.
Step 1. Get API Credentials
Visit Palabra API keys section to obtain your Client ID
and Client Secret
.
Step 2. Create a Session
Use your credentials to call the POST
/session-storage/session
endpoint. You'll receive the ws_url
required for establishing the WebSocket connection.
Request Example
- Python
- cURL
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import httpx
async def create_session(client_id: str, client_secret: str) -> dict:
url = "https://api.palabra.ai/session-storage/session"
headers = {"ClientId": client_id, "ClientSecret": client_secret}
payload = {"data": {"subscriber_count": 0, "publisher_can_subscribe": True}}
async with httpx.AsyncClient() as client:
response = await client.post(url, json=payload, headers=headers)
response.raise_for_status()
return response.json()
curl -X POST https://api.palabra.ai/session-storage/session \
-H "ClientId: <API_CLIENT_ID>" \
-H "ClientSecret: <API_CLIENT_SECRET>" \
-H "Content-Type: application/json" \
-d '{
"data": {
"subscriber_count": 0,
"publisher_can_subscribe": true
}
}'
Response Example
{
"publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
"subscriber": [],
"webrtc_room_name": "50ff0fa2",
"webrtc_url": "https://streaming-0.palabra.ai/livekit/",
"ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
"id": "7f99b553-4697...7d450728"
}
ws_url
- WebSocket endpoint to connect to for streaming audio data.
Step 3. Connect to the WebSocket API
Establish a WebSocket connection to the ws_url
you received in Step 2. You'll need to pass your publisher token as a query parameter.
Example
- Python
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import websockets
import asyncio
async def connect_websocket(ws_url: str, publisher_token: str):
# Connect to WebSocket with publisher token
full_url = f"{ws_url}?token={publisher_token}"
websocket = await websockets.connect(full_url, ping_interval=10, ping_timeout=30)
print("🔌 Connected to WebSocket")
return websocket
Step 4. Send Audio Data
Capture audio from your microphone and send it through the WebSocket connection. Audio must be sent as base64-encoded data in JSON messages.
Example
- Python
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import asyncio
import base64
import json
import queue
import threading
import numpy as np
import sounddevice as sd
async def stream_microphone(websocket):
sample_rate = 24000 # Palabra expects 24kHz audio
chunk_duration = 0.32 # 320ms chunks recommended
chunk_samples = int(sample_rate * chunk_duration)
audio_queue = queue.Queue(maxsize=100)
stop_event = threading.Event()
def input_callback(indata, frames, time_info, status):
try:
audio_queue.put_nowait(np.frombuffer(indata, dtype=np.int16).copy())
except queue.Full:
pass
def recording_thread():
with sd.RawInputStream(
samplerate=sample_rate,
channels=1,
dtype='int16',
callback=input_callback,
blocksize=int(sample_rate * 0.02) # 20ms callback
):
print("🎤 Microphone started")
while not stop_event.is_set():
time.sleep(0.01)
threading.Thread(target=recording_thread, daemon=True).start()
# Send audio chunks
buffer = np.array([], dtype=np.int16)
while True:
try:
audio_data = audio_queue.get(timeout=0.1)
buffer = np.concatenate([buffer, audio_data])
while len(buffer) >= chunk_samples:
chunk = buffer[:chunk_samples]
buffer = buffer[chunk_samples:]
# Send via WebSocket as base64-encoded JSON
message = {
"message_type": "input_audio_data",
"data": {
"data": base64.b64encode(chunk.tobytes()).decode("utf-8")
}
}
await websocket.send(json.dumps(message))
# Important: pace audio to real-time rate
await asyncio.sleep(chunk_duration)
except queue.Empty:
await asyncio.sleep(0.001)
Step 5. Configure Translation Settings
Send a JSON message through the WebSocket to configure your translation settings. See management documentation for full settings reference.
Example
- Python
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import json
async def configure_translation(websocket, source_lang: str, target_langs: list):
settings = {
"message_type": "set_task",
"data": {
"input_stream": {
"content_type": "audio",
"source": {
"type": "ws",
"format": "pcm_s16le",
"sample_rate": 24000,
"channels": 1
}
},
"output_stream": {
"content_type": "audio",
"target": {
"type": "ws",
"format": "pcm_s16le"
}
},
"pipeline": {
"preprocessing": {},
"transcription": {
"source_language": source_lang
},
"translations": [
{
"target_language": lang,
"speech_generation": {}
} for lang in target_langs
]
}
}
}
await websocket.send(json.dumps(settings))
print(f"⚙️ Translation configured: {source_lang} → {target_langs}")
Step 6. Receive and Play Translated Audio
Listen for messages from the WebSocket. Palabra sends different message types including transcriptions, translations, and audio data.
Example
- Python
# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra
import json
import base64
import sounddevice as sd
import numpy as np
import queue
async def receive_and_play(websocket):
# Audio playback setup
sample_rate = 24000
audio_queue = queue.Queue(maxsize=100)
def audio_callback(outdata, frames, time_info, status):
nonlocal buffer
# Fill buffer if needed
while len(buffer) < frames:
try:
buffer = np.concatenate([buffer, audio_queue.get_nowait()])
except queue.Empty:
break
# Provide audio frames
if len(buffer) >= frames:
outdata[:] = buffer[:frames].reshape(-1, 1)
buffer = buffer[frames:]
else:
outdata.fill(0)
buffer = np.array([], dtype=np.int16)
output_stream = sd.OutputStream(
samplerate=sample_rate,
channels=1,
dtype='int16',
callback=audio_callback,
blocksize=int(sample_rate * 0.02)
)
output_stream.start()
print("🔊 Audio playback started")
# Receive messages
async for message in websocket:
data = json.loads(message)
# Parse nested JSON if needed
if isinstance(data.get("data"), str):
data["data"] = json.loads(data["data"])
msg_type = data.get("message_type")
if msg_type == "current_task":
print("📝 Task confirmed")
elif msg_type == "output_audio_data":
# Decode base64 audio
audio_bytes = base64.b64decode(data["data"]["data"])
audio_array = np.frombuffer(audio_bytes, dtype=np.int16)
try:
audio_queue.put_nowait(audio_array)
except queue.Full:
pass
elif msg_type == "partial_transcription":
text = data["data"]["transcription"]["text"]
lang = data["data"]["transcription"]["language"]
print(f"\r\033[K💬 [{lang}] {text}", end="", flush=True)
elif msg_type == "final_transcription":
text = data["data"]["transcription"]["text"]
lang = data["data"]["transcription"]["language"]
print(f"\r\033[K✅ [{lang}] {text}")
Complete Example
Here's a minimal working example. For the full implementation, see nanopalabra_ws.
- Python
import asyncio
import os
import signal
async def main():
# Graceful shutdown
signal.signal(signal.SIGINT, lambda s, f: os._exit(0))
print("🚀 Palabra WebSocket Client")
# Step 1: Your API credentials
client_id = os.getenv("PALABRA_CLIENT_ID")
client_secret = os.getenv("PALABRA_CLIENT_SECRET")
# Step 2: Create session
session = await create_session(client_id, client_secret)
ws_url = session["data"]["ws_url"]
publisher_token = session["data"]["publisher"]
# Step 3: Connect to WebSocket
websocket = await connect_websocket(ws_url, publisher_token)
# Step 4: Configure translation
await configure_translation(websocket, "en", ["es"])
# Wait for settings to process
await asyncio.sleep(3)
# Step 5 & 6: Create tasks for streaming and receiving
receive_task = asyncio.create_task(receive_and_play(websocket))
stream_task = asyncio.create_task(stream_microphone(websocket))
print("\n🎧 Listening... Press Ctrl+C to stop\n")
try:
# Run until interrupted
await asyncio.gather(receive_task, stream_task)
except KeyboardInterrupt:
print("\n🛑 Shutdown complete")
if __name__ == "__main__":
asyncio.run(main())
Summary
Once you establish the WebSocket connection and send your translation settings, Palabra will process your audio stream in real-time. The service transcribes your speech, translates it to the specified target languages, and streams back both the text translations and synthesized audio through the same WebSocket connection, enabling seamless real-time communication.
Need help?
Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].