Building AI-Powered APIs with LLMs as Backend Logic

At work, my team recently built a distributed system that leverages AI to perform backend logic. This represents one of the more practical archetypes of AI applications — where a large language model (LLM) is embedded directly into the business logic layer of a service.

Rather than interacting with the model via a chat UI like ChatGPT, this architecture integrates the LLM as a service component. The interaction pattern, however, remains similar: we send a conversation history to the model as context and receive the next response in return.

Treating LLMs as Programmable Business Logic

In this paradigm, the chat history serves as a structured prompt, functioning almost like a programming template. Think of this as providing chat history as a template that will get you the next response from the LLM.

The model interprets this history and returns a response, which, in many use cases, should be in a machine-readable format. I’ve found that asking the LLM to return its response as JSON provides the best results. This approach allows me to deserialize the output into a native object, which can then be passed to other components within the system.

For example, the deserialized object might become part of an HTTP response returned to a client, an event published to a messaging system like Azure Service Bus or Kafka, or simply an input to another class method in a C# application. Treating the LLM’s output as part of a well-defined data flow keeps the AI component predictable and easier to integrate.

The Challenge of Reliable JSON Responses

Of course, asking an LLM to produce structured output isn’t a deterministic process. It’s a balancing act between crafting the right prompt and selecting the right model. Despite improvements in LLM reliability, the AI doesn’t always return perfect JSON, and even minor inconsistencies can break the downstream systems that rely on parsing that output.

For that reason, being able to experiment with different prompts and quickly deploy different model configurations is critical. I need to be able to spin up new models on demand and adjust their capacity based on my testing needs. This flexibility is especially important during the prototyping phase, when I’m iterating on prompt design or evaluating different output structures.

Provisioning Azure OpenAI Models for API Logic

To support this development pattern, I initially provisioned an Azure OpenAI service using Terraform:

resource "azurerm_cognitive_account" "openai" {

  name                = "oai-${var.application_name}-${var.environment_name}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name

  kind                          = "OpenAI"
  sku_name                      = "S0"
  public_network_access_enabled = true
}

Once the cognitive account was provisioned, I deployed a specific model:

resource "azurerm_cognitive_deployment" "deployment_gpt_35_turbo" {
  name                 = "gpt-35-turbo"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  sku {
    name     = "Standard"
    capacity = 10
  }
  model {
    format  = "OpenAI"
    name    = "gpt-35-turbo"
    version = "0125"
  }
}

This setup gives me full control over which models are available in each environment and how much compute capacity is allocated for them. That control is invaluable when testing and optimizing system behavior.

Model Selection: Cost and Capability Trade-offs

When designing an application that uses LLMs as part of its internal logic, choosing the right model is just as important as designing the prompt. Different models produce different responses — sometimes subtly, sometimes dramatically. Their performance, reliability, and even their ability to consistently return well-formed JSON can vary.

Being able to test against multiple models helps ensure you’re choosing the right one — not just in terms of output quality, but also in cost. For instance, if GPT-3.5 Turbo meets the functional requirements of the system, there’s little reason to incur the additional cost of running GPT-4 or GPT-4o in production.

This model flexibility makes the difference between a proof-of-concept that only works under ideal conditions and a scalable, maintainable service that integrates AI into a larger distributed architecture.

Conclusion

Building APIs that use LLMs as backend logic components unlocks a new pattern for system design — one where AI isn’t just a user-facing feature but a programmable part of your service architecture. While challenges exist in ensuring reliable, structured output from non-deterministic models, those can be managed with thoughtful prompt engineering, robust testing, and flexible model provisioning.

With infrastructure like Azure OpenAI and tooling like Terraform, it’s easier than ever to build, test, and deploy AI-powered APIs that act as intelligent, adaptable backend services.

Alt