Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

Introduction & Workshop Setup 00:00

  • Heroku announced "Heroku Managed Inference and Agents" as part of its AI offering, focusing on building agentic applications.
  • The session is a hands-on workshop requiring only a browser, with all content and access to the Heroku platform available for attendees.
  • Participants need to sign up for a Heroku account (no credit card required) to access the services.
  • Attendees should join the "workshop Heroku AI" Slack channel for slides and links, and use a QR code for the workshop site.
  • The setup involves deploying a "Heroku Jupyter template" application to Heroku, ensuring a unique name, and selecting the "AI engineer worldfair" team for free access.
  • A password environment variable must be set for the Jupyter notebook, which will run on a Heroku dyno and use Heroku PostgreSQL for persistence.

Overview of Heroku AI 03:15

  • The current era is seen as an exciting time for building with AI, similar to past technological inflection points like the internet and cloud.
  • Heroku aims to simplify AI development, much like it did for web app deployment with "git push heroku main," making every software engineer an AI engineer.
  • Heroku addresses "day two" AI challenges, including operation, scaling, model selection, and tool safety, beyond just initial deployment.
  • They offer a curated set of models deeply integrated into an agentic control loop that runs on Heroku, providing access to tools like code execution and data under Heroku's trust layer.
  • The Model Context Protocol (MCP) is used for extending agents, with Heroku positioning itself as "the Heroku of AI."
  • Heroku's opinionated approach with sensible defaults allows users to attach AI to their apps with a single CLI command: heroku ai models create.

Heroku AI Capabilities and Tools 06:26

  • Heroku AI provides three main primitives for building agentic applications:
    1. Inference: Access to curated models like Anthropic Claude 3.5, 3.7, 4 for text-to-text, Cohere Embed for embeddings, and Stable Image Ultra for image generation.
    2. Model Context Protocol (MCP): Enables building remote MCP servers and standard I/O MCP servers that run in Heroku's trusted compute, scaling to zero when not in use to save costs.
    3. PG Vector: A vector database for embeddings.
  • Heroku's trusted compute layer (dynos) runs first-party tools like code execution, which stream data back to the user.
  • Future first-party tools are planned, including web search for grounding and memory capabilities.
  • Users can also bring their own tools using MCP, which will run on Heroku's compute.

Provisioning Managed Inference 13:03

  • Managed inference allows AI models to run within the same Heroku infrastructure as the application, ensuring data remains within the network.
  • To provision, users add the "Heroku Managed Inference and Agents" add-on from the application's "Resources" page in the Heroku dashboard.
  • Users then select their desired model (e.g., Claude 4).
  • This process automatically sets environment variables (API URL, API key, model ID) for the application, enabling access to AI services via existing SDKs (like OpenAI specification) or direct HTTP requests.

Managed Inference Demonstrations 20:18

  • The workshop's first part involves setting up the environment by loading inference_URL, inference_key, inference_model_ID, and target_application_name environment variables.
  • The target_application_name grants tools permission to run commands on the specified Heroku app.
  • A basic chat completions demo shows an HTTP request to the endpoint, asking to explain "managed inference in one sentence," receiving a concise definition in return.
  • A streaming chat completions demo illustrates how the service returns chunks of the answer in real-time, providing immediate feedback as the response is generated and rendered as markdown.

Introduction to Heroku Agents and Tools 24:48

  • Heroku provides an agents endpoint that supports native tools, which are also implemented as MCPs.
  • Agent responses are always streamed due to the time required for tool execution.
  • Users may need to grant specific access permissions for tools to interact with databases or applications.
  • Supported Heroku Tools:
    • dyno run command: Executes Unix commands or pre-deployed scripts on a Heroku dyno, suitable for trusted, predictable code.
    • postgres get schema and postgres run query: Enable LLMs to understand database schemas and generate/execute SQL queries for data retrieval.
    • html to markdown and pdf to markdown: Extract text content from URLs for inference.
    • code execution (Python, Node, Ruby, Go): LLMs generate and execute code on one-off Heroku dynos, supporting dependency installation. These dynos scale to zero, charging only for compute used during execution.

Dino Run Command Example 28:25

  • A demonstration of the dyno run command tool shows how an LLM can get real-time information, such as the current date and time on the server, which it cannot know inherently.
  • The payload specifies the heroku tool type, dyno run command, the target application name, and the date command.
  • The command runs on the designated Heroku application, and the date/time is returned, followed by the LLM's inference response.

Code Execution Example 30:54

  • The code execution node tool is demonstrated by asking the LLM to calculate the 30th Fibonacci number.
  • The LLM generates the JavaScript code for the Fibonacci algorithm, which is then executed on a Heroku dyno, returning the result (233) and an explanation.
  • The example is repeated using code execution go, showcasing the LLM's ability to generate and execute Go code for the same task.

Chaining Multiple Tools 32:21

  • A more complex example demonstrates chaining html to markdown and code execution python tools.
  • The prompt asks the agent to fetch a Python snippet of the Euclidean algorithm from Wikipedia and use it to calculate the common divisor of 252 and 105.
  • The agent first uses html to markdown to retrieve and read the Wikipedia page content.
  • It then generates Python code based on the extracted algorithm and executes it via code execution python.
  • The process takes longer due to the multiple tool calls, with the final response including the Python implementation, the calculated divisor, and an explanation.

Database Access for Agents 35:54

  • To grant agents access to a database, it must be attached as a "follower" (read-only) for security, preventing LLMs from having write access to production databases.
  • This involves going to the Heroku dashboard, selecting the database application, and attaching the target Jupyter notebook application to the follower database.
  • The database's environment variable name (e.g., "Heroku PostgreSQL Aqua") is then used in the code.
  • The postgres get schema and postgres run query tools are enabled for the application.
  • A demo uses a solar energy company database to answer "how much energy has been saved in the last 30 days." The agent first gets the database schema, then generates and executes an SQL query to retrieve the data, finally providing a report including key metrics and a breakdown by system.

Model Context Protocol (MCP) Support 43:04

  • Heroku's agents endpoint supports custom Model Context Protocols (MCPs) in addition to its native tools.
  • Users can attach an MCP server (e.g., "MCP Brave") to their application via the Heroku dashboard, similar to attaching a database.
  • A demonstration uses the brave web search tool (an MCP) to find "most recent news about the AI agents," requiring an API key.
  • The MCP runs on a Heroku dyno, scales to zero after execution, and returns search results for the inference service to process.
  • Users can deploy their own MCPs to Heroku by defining an mcp process type in their Procfile, with "Click to Deploy" buttons simplifying the process.

Using MCPs Externally 48:10

  • Deployed MCPs on Heroku are also accessible remotely, outside of Heroku agents (e.g., for use with other platforms like Corsor or Cloud Desktop).
  • The "toolkit integration" page on the Heroku management dashboard provides an endpoint for external access, authenticated by a bearer token (with OAuth support planned).
  • A Python example demonstrates using the Entropic MCP package to create an MCP client, connecting to the Heroku endpoint to list and execute available tools (e.g., Brave local/web search, Perplexity ask) remotely.

OpenAI SDK Compatibility and Wrap-up 50:50

  • Heroku's chat completion endpoint is 95% compatible with the OpenAI API specification, allowing users to leverage the OpenAI SDK.
  • A final demonstration shows a basic inference operation performed using the OpenAI SDK with Heroku's API key and URL.
  • The workshop concludes, with continued access to the Heroku platform extended until the weekend for participants to experiment further, as a free tier is not currently available.
  • Resources such as the Heroku dev center documentation, the Heroku AI website, and the Heroku community on X (Twitter) are provided for further learning and engagement.