Livekit - Zentry

📢 Announcing our research paper: Zentry achieves 26% higher accuracy than OpenAI Memory, 91% lower latency, and 90% token savings! Read the paper to learn how we're revolutionizing AI agent memory.

This guide demonstrates how to create a memory-enabled voice assistant using LiveKit, Deepgram, OpenAI, and Zentry, focusing on creating an intelligent, context-aware travel planning agent.

Prerequisites

Before you begin, make sure you have:

Installed Livekit Agents SDK with voice dependencies of silero and deepgram:

pip install livekit \
livekit-agents \
livekit-plugins-silero \
livekit-plugins-deepgram \
livekit-plugins-openai

Installed Zentry SDK:

pip install Zentryai

Set up your API keys in a .env file:

LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
DEEPGRAM_API_KEY=your_deepgram_api_key
Zentry_API_KEY=your_Zentry_api_key
OPENAI_API_KEY=your_openai_api_key

Note: Make sure to have a Livekit and Deepgram account. You can find these variables LIVEKIT_URL , LIVEKIT_API_KEY and LIVEKIT_API_SECRET from LiveKit Cloud Console and for more information you can refer this website LiveKit Documentation. For DEEPGRAM_API_KEY you can get from Deepgram Console refer this website Deepgram Documentation for more details.

Code Breakdown

Let’s break down the key components of this implementation:

1. Setting Up Dependencies and Environment

import asyncio
import logging
import os
from typing import List, Dict, Any, Annotated

import aiohttp
from dotenv import load_dotenv
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit import rtc, api
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
from Zentry import AsyncMemoryClient

# Load environment variables
load_dotenv()

# Configure logging
logger = logging.getLogger("memory-assistant")
logger.setLevel(logging.INFO)

# Define a global user ID for simplicity
USER_ID = "voice_user"

# Initialize Zentry client
Zentry = AsyncMemoryClient()

This section handles:

Importing required modules
Loading environment variables
Setting up logging
Extracting user identification
Initializing the Zentry client

2. Memory Enrichment Function

async def _enrich_with_memory(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
    """Add memories and Augment chat context with relevant memories"""
    if not chat_ctx.messages:
        return
    
    # Store user message in Zentry
    user_msg = chat_ctx.messages[-1]
    await Zentry.add(
        [{"role": "user", "content": user_msg.content}], 
        user_id=USER_ID
    )
    
    # Search for relevant memories
    results = await Zentry.search(
        user_msg.content, 
        user_id=USER_ID,
    )
    
    # Augment context with retrieved memories
    if results:
        memories = ' '.join([result["memory"] for result in results])
        logger.info(f"Enriching with memory: {memories}")
        
        rag_msg = llm.ChatMessage.create(
            text=f"Relevant Memory: {memories}\n",
            role="assistant",
        )
        
        # Modify chat context with retrieved memories
        chat_ctx.messages[-1] = rag_msg
        chat_ctx.messages.append(user_msg)

This function:

Stores user messages in Zentry
Performs semantic search for relevant memories
Augments the chat context with retrieved memories
Enables contextually aware responses

3. Prewarm and Entrypoint Functions

def prewarm_process(proc: JobProcess):
    # Preload silero VAD in memory to speed up session start
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    # Connect to LiveKit room
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    # Wait for participant
    participant = await ctx.wait_for_participant()
    
    # Initialize Zentry client
    Zentry = AsyncMemoryClient()

    # Define initial system context
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            """
            You are a helpful voice assistant.
            You are a travel guide named George and will help the user to plan a travel trip of their dreams. 
            You should help the user plan for various adventures like work retreats, family vacations or solo backpacking trips. 
            You should be careful to not suggest anything that would be dangerous, illegal or inappropriate.
            You can remember past interactions and use them to inform your answers.
            Use semantic memory retrieval to provide contextually relevant responses. 
            """
        ),
    )

    # Create VoicePipelineAgent with memory capabilities
    agent = VoicePipelineAgent(
        chat_ctx=initial_ctx,
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        before_llm_cb=_enrich_with_memory,
    )

    # Start agent and initial greeting
    agent.start(ctx.room, participant)
    await agent.say(
        "Hello! I'm George. Can I help you plan an upcoming trip? ",
        allow_interruptions=True
    )

# Run the application
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm_process))

The entrypoint function:

Connects to LiveKit room
Initializes Zentry memory client
Sets up initial system context
Creates a VoicePipelineAgent with memory enrichment
Starts the agent with an initial greeting

Create a Memory-Enabled Voice Agent

Now that we’ve explained each component, here’s the complete implementation that combines OpenAI Agents SDK for voice with Zentry’s memory capabilities:

import asyncio
import logging
import os
from typing import List, Dict, Any, Annotated

import aiohttp
from dotenv import load_dotenv
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit import rtc, api
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
from Zentry import AsyncMemoryClient

# Load environment variables
load_dotenv()

# Configure logging
logger = logging.getLogger("memory-assistant")
logger.setLevel(logging.INFO)

# Define a global user ID for simplicity
USER_ID = "voice_user"

# Initialize Zentry memory client
Zentry = AsyncMemoryClient()

def prewarm_process(proc: JobProcess):
    # Preload silero VAD in memory to speed up session start
    proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext):
    # Connect to LiveKit room
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    # Wait for participant
    participant = await ctx.wait_for_participant()
    
    async def _enrich_with_memory(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
        """Add memories and Augment chat context with relevant memories"""
        if not chat_ctx.messages:
            return
        
        # Store user message in Zentry
        user_msg = chat_ctx.messages[-1]
        await Zentry.add(
            [{"role": "user", "content": user_msg.content}], 
            user_id=USER_ID
        )
        
        # Search for relevant memories
        results = await Zentry.search(
            user_msg.content, 
            user_id=USER_ID,
        )
        
        # Augment context with retrieved memories
        if results:
            memories = ' '.join([result["memory"] for result in results])
            logger.info(f"Enriching with memory: {memories}")
            
            rag_msg = llm.ChatMessage.create(
                text=f"Relevant Memory: {memories}\n",
                role="assistant",
            )
            
            # Modify chat context with retrieved memories
            chat_ctx.messages[-1] = rag_msg
            chat_ctx.messages.append(user_msg)

    # Define initial system context
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            """
            You are a helpful voice assistant.
            You are a travel guide named George and will help the user to plan a travel trip of their dreams. 
            You should help the user plan for various adventures like work retreats, family vacations or solo backpacking trips. 
            You should be careful to not suggest anything that would be dangerous, illegal or inappropriate.
            You can remember past interactions and use them to inform your answers.
            Use semantic memory retrieval to provide contextually relevant responses. 
            """
        ),
    )

    # Create VoicePipelineAgent with memory capabilities
    agent = VoicePipelineAgent(
        chat_ctx=initial_ctx,
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=openai.TTS(),
        before_llm_cb=_enrich_with_memory,
    )

    # Start agent and initial greeting
    agent.start(ctx.room, participant)
    await agent.say(
        "Hello! I'm George. Can I help you plan an upcoming trip? ",
        allow_interruptions=True
    )

# Run the application
if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm_process))

Key Features of This Implementation

Semantic Memory Retrieval: Uses Zentry to store and retrieve contextually relevant memories
Voice Interaction: Leverages LiveKit for voice communication
Intelligent Context Management: Augments conversations with past interactions
Travel Planning Specialization: Focused on creating a helpful travel guide assistant

Running the Example

To run this example:

Install all required dependencies
Set up your .env file with the necessary API keys
Ensure your microphone and audio setup are configured
Run the script with Python 3.11 or newer and with the following command:

python Zentry-livekit-voice-agent.py start

After the script starts, you can interact with the voice agent using Livekit’s Agent Platform and Connect to the agent inorder to start conversations.

Best Practices for Voice Agents with Memory

Context Preservation: Store enough context with each memory for effective retrieval
Privacy Considerations: Implement secure memory management
Relevant Memory Filtering: Use semantic search to retrieve only the most pertinent memories
Error Handling: Implement robust error handling for memory operations

Debugging Function Tools

To run the script in debug mode simply start the assistant with dev mode:

python Zentry-livekit-voice-agent.py dev

When working with memory-enabled voice agents, use Python’s logging module for effective debugging:

import logging

# Set up logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger("memory_voice_agent")

Integrations

​Prerequisites

​Code Breakdown

​1. Setting Up Dependencies and Environment

​2. Memory Enrichment Function

​3. Prewarm and Entrypoint Functions

​Create a Memory-Enabled Voice Agent

​Key Features of This Implementation

​Running the Example

​Best Practices for Voice Agents with Memory

​Debugging Function Tools

Prerequisites

Code Breakdown

1. Setting Up Dependencies and Environment

2. Memory Enrichment Function

3. Prewarm and Entrypoint Functions

Create a Memory-Enabled Voice Agent

Key Features of This Implementation

Running the Example

Best Practices for Voice Agents with Memory

Debugging Function Tools