Quick Start (WebSockets)

Best for backend-side apps

The following steps explain how to use the WebSocket-based architecture, which is recommended for server-side applications and backend integrations. If you are looking for client-side solutions, please refer to our WebRTC Quick Start Guide.

Palabra's API solution enables real-time speech translation through a WebSocket-based architecture.

The process involves creating a secure session, establishing a WebSocket connection to Palabra's streaming API, sending your audio data through the WebSocket connection, and configuring the translation pipeline with your desired language settings.

Once connected, your speech is automatically transcribed, translated, and synthesized into the target language in real time. Palabra then streams the translated audio back through the same WebSocket connection, allowing you to receive and process it in your application instantly.

Step 1. Get API Credentials

Visit Palabra API keys section to obtain your Client ID and Client Secret.

Step 2. Create a Session

Use your credentials to call the POST /session-storage/session endpoint. You'll receive the ws_url required for establishing the WebSocket connection.

Request Example

Python
cURL

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import httpx

async def create_session(client_id: str, client_secret: str) -> dict:
    url = "https://api.palabra.ai/session-storage/session"
    headers = {"ClientId": client_id, "ClientSecret": client_secret}
    payload = {"data": {"subscriber_count": 0, "publisher_can_subscribe": True}}
    
    async with httpx.AsyncClient() as client:
        response = await client.post(url, json=payload, headers=headers)
        response.raise_for_status()
        return response.json()

curl -X POST https://api.palabra.ai/session-storage/session \
  -H "ClientId: <API_CLIENT_ID>" \
  -H "ClientSecret: <API_CLIENT_SECRET>" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "subscriber_count": 0,
      "publisher_can_subscribe": true
    }
  }'

Response Example

{
  "publisher": "eyJhbGciOiJIU...Gxr2gjWSA4",
  "subscriber": [],
  "webrtc_room_name": "50ff0fa2",
  "webrtc_url": "https://streaming-0.palabra.ai/livekit/",
  "ws_url": "wss://streaming-0.palabra.ai/streaming-api/v1/speech-to-speech/stream",
  "id": "7f99b553-4697...7d450728"
}

ws_url - WebSocket endpoint to connect to for streaming audio data.

Step 3. Connect to the WebSocket API

Establish a WebSocket connection to the ws_url you received in Step 2. You'll need to pass your publisher token as a query parameter.

Example

Python

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import websockets
import asyncio

async def connect_websocket(ws_url: str, publisher_token: str):
    # Connect to WebSocket with publisher token
    full_url = f"{ws_url}?token={publisher_token}"
    websocket = await websockets.connect(full_url, ping_interval=10, ping_timeout=30)
    print("🔌 Connected to WebSocket")
    return websocket

Step 4. Send Audio Data

Capture audio from your microphone and send it through the WebSocket connection. Audio must be sent as base64-encoded data in JSON messages.

Example

Python

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import asyncio
import base64
import json
import queue
import threading
import numpy as np
import sounddevice as sd

async def stream_microphone(websocket):
    sample_rate = 24000  # Palabra expects 24kHz audio
    chunk_duration = 0.32  # 320ms chunks recommended
    chunk_samples = int(sample_rate * chunk_duration)
    
    audio_queue = queue.Queue(maxsize=100)
    stop_event = threading.Event()

    def input_callback(indata, frames, time_info, status):
        try:
            audio_queue.put_nowait(np.frombuffer(indata, dtype=np.int16).copy())
        except queue.Full:
            pass

    def recording_thread():
        with sd.RawInputStream(
                samplerate=sample_rate,
                channels=1,
                dtype='int16',
                callback=input_callback,
                blocksize=int(sample_rate * 0.02)  # 20ms callback
        ):
            print("🎤 Microphone started")
            while not stop_event.is_set():
                time.sleep(0.01)

    threading.Thread(target=recording_thread, daemon=True).start()

    # Send audio chunks
    buffer = np.array([], dtype=np.int16)
    while True:
        try:
            audio_data = audio_queue.get(timeout=0.1)
            buffer = np.concatenate([buffer, audio_data])

            while len(buffer) >= chunk_samples:
                chunk = buffer[:chunk_samples]
                buffer = buffer[chunk_samples:]
                
                # Send via WebSocket as base64-encoded JSON
                message = {
                    "message_type": "input_audio_data",
                    "data": {
                        "data": base64.b64encode(chunk.tobytes()).decode("utf-8")
                    }
                }
                await websocket.send(json.dumps(message))
                
                # Important: pace audio to real-time rate
                await asyncio.sleep(chunk_duration)
                
        except queue.Empty:
            await asyncio.sleep(0.001)

Step 5. Configure Translation Settings

Send a JSON message through the WebSocket to configure your translation settings. See management documentation for full settings reference.

Example

Python

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import json

async def configure_translation(websocket, source_lang: str, target_langs: list):
    settings = {
        "message_type": "set_task",
        "data": {
            "input_stream": {
                "content_type": "audio",
                "source": {
                    "type": "ws",
                    "format": "pcm_s16le",
                    "sample_rate": 24000,
                    "channels": 1
                }
            },
            "output_stream": {
                "content_type": "audio",
                "target": {
                    "type": "ws",
                    "format": "pcm_s16le"
                }
            },
            "pipeline": {
                "preprocessing": {},
                "transcription": {
                    "source_language": source_lang
                },
                "translations": [
                    {
                        "target_language": lang,
                        "speech_generation": {}
                    } for lang in target_langs
                ]
            }
        }
    }
    
    await websocket.send(json.dumps(settings))
    print(f"⚙️ Translation configured: {source_lang} → {target_langs}")

Step 6. Receive and Play Translated Audio

Listen for messages from the WebSocket. Palabra sends different message types including transcriptions, translations, and audio data.

Example

Python

# Full working example: https://github.com/PalabraAI/palabra-ai-python/tree/main/examples/nanopalabra

import json
import base64
import sounddevice as sd
import numpy as np
import queue

async def receive_and_play(websocket):
    # Audio playback setup
    sample_rate = 24000
    audio_queue = queue.Queue(maxsize=100)
    
    def audio_callback(outdata, frames, time_info, status):
        nonlocal buffer
        # Fill buffer if needed
        while len(buffer) < frames:
            try:
                buffer = np.concatenate([buffer, audio_queue.get_nowait()])
            except queue.Empty:
                break
        
        # Provide audio frames
        if len(buffer) >= frames:
            outdata[:] = buffer[:frames].reshape(-1, 1)
            buffer = buffer[frames:]
        else:
            outdata.fill(0)
    
    buffer = np.array([], dtype=np.int16)
    output_stream = sd.OutputStream(
        samplerate=sample_rate,
        channels=1,
        dtype='int16',
        callback=audio_callback,
        blocksize=int(sample_rate * 0.02)
    )
    output_stream.start()
    print("🔊 Audio playback started")
    
    # Receive messages
    async for message in websocket:
        data = json.loads(message)
        
        # Parse nested JSON if needed
        if isinstance(data.get("data"), str):
            data["data"] = json.loads(data["data"])
        
        msg_type = data.get("message_type")
        
        if msg_type == "current_task":
            print("📝 Task confirmed")
        elif msg_type == "output_audio_data":
            # Decode base64 audio
            audio_bytes = base64.b64decode(data["data"]["data"])
            audio_array = np.frombuffer(audio_bytes, dtype=np.int16)
            try:
                audio_queue.put_nowait(audio_array)
            except queue.Full:
                pass
        elif msg_type == "partial_transcription":
            text = data["data"]["transcription"]["text"]
            lang = data["data"]["transcription"]["language"]
            print(f"\r\033[K💬 [{lang}] {text}", end="", flush=True)
        elif msg_type == "final_transcription":
            text = data["data"]["transcription"]["text"]
            lang = data["data"]["transcription"]["language"]
            print(f"\r\033[K✅ [{lang}] {text}")

Complete Example

Here's a minimal working example. For the full implementation, see nanopalabra_ws.

Python

import asyncio
import os
import signal

async def main():
    # Graceful shutdown
    signal.signal(signal.SIGINT, lambda s, f: os._exit(0))
    print("🚀 Palabra WebSocket Client")
    
    # Step 1: Your API credentials
    client_id = os.getenv("PALABRA_CLIENT_ID")
    client_secret = os.getenv("PALABRA_CLIENT_SECRET")
    
    # Step 2: Create session
    session = await create_session(client_id, client_secret)
    ws_url = session["data"]["ws_url"]
    publisher_token = session["data"]["publisher"]
    
    # Step 3: Connect to WebSocket
    websocket = await connect_websocket(ws_url, publisher_token)
    
    # Step 4: Configure translation
    await configure_translation(websocket, "en", ["es"])
    
    # Wait for settings to process
    await asyncio.sleep(3)
    
    # Step 5 & 6: Create tasks for streaming and receiving
    receive_task = asyncio.create_task(receive_and_play(websocket))
    stream_task = asyncio.create_task(stream_microphone(websocket))
    
    print("\n🎧 Listening... Press Ctrl+C to stop\n")
    
    try:
        # Run until interrupted
        await asyncio.gather(receive_task, stream_task)
    except KeyboardInterrupt:
        print("\n🛑 Shutdown complete")

if __name__ == "__main__":
    asyncio.run(main())

Summary

Once you establish the WebSocket connection and send your translation settings, Palabra will process your audio stream in real-time. The service transcribes your speech, translates it to the specified target languages, and streams back both the text translations and synthesized audio through the same WebSocket connection, enabling seamless real-time communication.

Need help?

Need Help? If you have any questions or need assistance, please don't hesitate to contact us at [email protected].

Step 1. Get API Credentials​

Step 2. Create a Session​

Request Example​

Response Example​

Step 3. Connect to the WebSocket API​

Step 4. Send Audio Data​

Step 5. Configure Translation Settings​

Step 6. Receive and Play Translated Audio​

Complete Example​

Summary​

Need help?​

Step 1. Get API Credentials

Step 2. Create a Session

Request Example

Response Example

Step 3. Connect to the WebSocket API

Step 4. Send Audio Data

Step 5. Configure Translation Settings

Step 6. Receive and Play Translated Audio

Complete Example

Summary

Need help?