SUMM

Caylent builds custom solutions for clients ranging from startups to Fortune 500 companies, focusing on a variety of tech challenges including app creation and database migration
The team consists of passionate autodidacts with a tendency toward product diversity and rapid prototyping
Generative AI is viewed as a powerful but not all-encompassing solution; misconceptions often arise about its capabilities

Developed an agent for Brainbox AI to optimize HVAC systems across tens of thousands of buildings, resulting in significant greenhouse gas reductions
Built AI-driven solutions for Simmons (water management), Pipes AI, Virtual Moving Technologies, and Z5 Inventory
Demonstrated a multimodal search system for Nature Footage, indexing and making searchable extensive stock video collections using semantic and vector embeddings

For Nature Footage, built multimodal pooling embeddings using sampled video frames combined with text for advanced search
For a sports video customer, used audio amplitude spectrography to identify highlights by tracking crowd cheering, integrated both audio and video embeddings for event detection, and sent notifications based on detected plays
Simple video annotation, such as overlaying the three-point line, drastically improves model performance for event detection

Utilizes various storage and search solutions such as Postgres PGVector and OpenSearch for efficient vector search
Prefers Postgres PGVector for vector storage but also leverages Redis and AWS MemoryDB for RAM-fast search where required, mindful of cost and scalability
Runs workloads on AWS services like Bedrock and SageMaker, and explores custom silicon (Tranium, Inferentia) for price/performance advantages (around 60% better than Nvidia GPUs in specific scenarios)
Model use includes proprietary (Claude, Nova), open-source (Llama, Deepseek), and embeddings tuned for business needs

Institutions often require custom fine-tuning or layered applications on self-service tools
Key differentiator is leveraging context; richer user context leads to smarter, more relevant LLM-powered applications
Tracking and administering third-party tool usage is a recurring challenge, often requiring network-level monitoring

Successful systems require more than embeddings and evals; understanding real user access patterns is crucial
Faceted search and filters built atop embeddings yield more usable results; speed is critical—slow models risk user abandonment
Good UX can compensate for some inference latency through strategic design (e.g., loading spinners)
Prompt engineering has become more effective than fine-tuning with modern models, reducing the need for ongoing prompt fixes as models evolve
Automations like prompt/context management, eval layers, and cost tracking are vital to long-term maintainability and scalability

Prompt engineering generates significant gains as models improve (e.g., from Claude 3.5 to Claude 4 saw marked improvements without regressions)
The economics of inference must be considered; high-end models can be costly (“Is this inference going to bankrupt my company?”)
Effective caching and prompt design can optimize both cost and reliability

Evaluation suites start with “vibe checks” and evolve into binary or scored metrics for continuous improvement
UX orchestration, prompt versioning, and generative UI enable dynamic, user-personalized responses (e.g., just-in-time React components for dashboards)
Production features include adaptive UI per user, efficient document delivery for bandwidth-limited users, and channel selection (chat vs. voice) informed by actual user workflows

Delegate computation tasks appropriately; don’t use LLMs for math operations when native code suffices
Manage output tokens carefully to control inference costs
Take advantage of batch inference and prompt caching to further reduce costs
Continually refine context input for efficient, accurate LLM responses—strip irrelevant info and add user-specific context for optimal results
Speaker invites discussion of new use cases and collaboration opportunities

POC to PROD: Hard Lessons from 200+ Enterprise GenAI Deployments - Randall Hunt, Caylent