Your security team approved the Azure subscription. Your compliance team signed off on the data handling agreement. Your legal team reviewed the vendor contract. And now someone in a Tuesday afternoon standup is asking when you can just “start sending prompts” to the model. This is the moment where enterprise AI deployments either get done right or get done fast—and those two things are rarely the same thing.
Deploying Azure OpenAI Service in a corporate environment is not like standing up a SaaS tool. The data flowing through it—employee queries, customer inputs, internal documents fed into retrieval pipelines—can be sensitive, regulated, or both. Getting the architecture wrong at the start means retrofitting security controls later, which is expensive, painful, and the kind of thing that gets documented in incident post-mortems.
Here’s how to do it right from day one.
Lock Down the Network First
The default Azure OpenAI configuration exposes a public endpoint. That means anyone with a valid API key can call your model from anywhere on the internet. For a corporate deployment, that’s not acceptable.
The fix is Azure Private Link, which provisions a private endpoint inside your Virtual Network (VNet). Traffic to the model never leaves the Microsoft backbone—no public internet transit, no exposure to external scanning. Once you enable this, you explicitly disable public network access in the Azure OpenAI networking settings. Not “restrict it.” Disable it. The distinction matters when an auditor is asking questions.
Pro Tip: Don’t enable Private Link and leave public access on “selected networks” as a fallback. That’s not a security control—that’s a door you left unlocked because you couldn’t find the key.
Most enterprises deploying AI at scale are already running a Hub-and-Spoke network topology. Place the Azure OpenAI resource in a Spoke VNet, with your central Hub VNet handling firewall inspection, DNS resolution, and egress filtering through Azure Firewall. VNet peering connects them. Every request to the model gets routed through the Hub for inspection before it ever reaches the resource.
If you also need specific Azure services—like an Azure AI Search instance for a RAG pipeline—to reach the OpenAI endpoint, use Resource Instance Rules. These allow named Azure resources to bypass the firewall restriction while still keeping the endpoint private. It’s tighter than IP allowlisting and survives infrastructure changes without manual updates.
Replace API Keys with Managed Identities
API keys are convenient for getting started. They’re also easy to accidentally commit to a repository, hard to audit, and tedious to rotate. At enterprise scale, they become a security liability before the project even ships.
The correct approach is Managed Identities. Your application—whether it’s running on Azure App Service, Azure Kubernetes Service, or an Azure Function—gets a system-assigned or user-assigned identity backed by Microsoft Entra ID (formerly Azure AD). No credentials to manage, no secrets in environment variables, no one pasting API keys into Slack.
Once the identity is established, assign the Cognitive Services OpenAI User role via Role-Based Access Control (RBAC). That role grants exactly what the application needs to make inference calls—nothing more. Avoid assigning Cognitive Services Contributor or Owner to runtime identities. Sound familiar? It should. Least privilege applies here the same way it does everywhere else.
Understand What Microsoft Does (and Doesn’t) Do With Your Data
This comes up in every compliance review, and the answer is actually reassuring: Microsoft does not use customer prompts or completions to train or improve the underlying OpenAI models. That’s contractually established and documented in Azure’s data privacy documentation.
What you control is encryption. By default, data at rest is encrypted with Microsoft-managed keys. For financial services, healthcare, or other regulated industries, that’s often not sufficient. You’ll need Customer-Managed Keys (CMK) stored in Azure Key Vault—and when you configure that Key Vault, enable soft-delete and purge protection. If a key is accidentally deleted and you can’t recover it, you lose access to your data. That conversation with the CISO is not one you want to have.
Azure OpenAI also supports Infrastructure Encryption, which adds a second layer of encryption at the physical disk level. It’s an opt-in feature, but if you’re building for an industry that requires encryption-in-depth, enable it during provisioning. Retrofitting it later requires redeployment.
| Compliance Framework | Azure OpenAI Coverage |
|---|---|
| SOC 1/2/3 | Included in Azure certification scope |
| ISO 27001 | Included in Azure certification scope |
| HIPAA | Business Associate Agreement available |
| PCI DSS | Included in Azure certification scope |
| FedRAMP High | Supported on designated regions |
| GDPR / ISO 27018 | Data processing terms available |
These are inherited from the Azure platform, not specific to the OpenAI service. Verify the current scope in the Azure compliance documentation before committing to a compliance roadmap—certifications can vary by region and service tier.
Configure Content Filtering Before Anyone Sends a Prompt
Azure OpenAI includes Azure AI Content Safety as a built-in layer. By default, it screens for four categories: hate, sexual content, violence, and self-harm. Each category has configurable severity thresholds—Safe, Low, Medium, High—and you decide what gets blocked.
For a customer-facing application, “block anything above Low” is a reasonable starting point. For internal tools used by security analysts or medical professionals, you might need to adjust those thresholds so legitimate queries aren’t flagged. The point is that you configure this before deployment, not after the first incident.
Warning: The default content filtering policy is not “off”—but it’s also not configured for your specific use case. Review and customize the policy before going to production. “We used the defaults” is not a defensible answer when someone asks why the model responded the way it did.
You can also upload custom blocklists to prevent the model from discussing specific topics, mentioning competitors, or referencing sensitive internal information. This is particularly useful for external-facing assistants where brand and legal risk intersect.
Build the Audit Logging Architecture Around APIM
Here’s the problem with standard Azure diagnostic logging for AI services: it captures metadata—request counts, latency, error rates—but not the actual content of prompts and completions. That’s by design, for privacy reasons. For an enterprise that needs to demonstrate what the model said to a specific user on a specific date, that’s insufficient.
The solution is placing Azure API Management (APIM) in front of the Azure OpenAI endpoint. APIM has a native “Log LLM messages” capability that captures full request and response payloads and routes them to a Log Analytics Workspace. You get the conversation content, the token counts, the model parameters—everything you need for audit reconstruction.
Enable diagnostic settings on the Azure OpenAI resource itself for operational monitoring: latency by model, request success rates, throttling events. And enable the Azure Activity Log to track administrative actions—who changed the content filtering policy, who regenerated an API key, who modified the network rules. That’s your change management audit trail.
Key Insight: The Azure Activity Log and the APIM prompt logs solve different problems. Activity Logs answer “who changed the system.” APIM logs answer “what did the system say.” You need both.
Plan for Production Load Before You Need To
If you’re building a production RAG system—document search, internal knowledge bases, customer-facing assistants—the Pay-As-You-Go (PAYG) model will eventually create problems. Token-based billing with shared capacity means latency spikes under load, and those spikes tend to happen at exactly the wrong time.
Provisioned Throughput Units (PTU) give you dedicated capacity with guaranteed throughput and consistent latency. You reserve compute per region per model, and that capacity is yours. The tradeoff is commitment—PTUs require reserved capacity purchases rather than consumption billing.
For most enterprise deployments, the right answer is a hybrid model: PTUs for your baseline predictable workload, PAYG for handling spikes. Your PTU allocation should cover the steady-state load you can forecast; PAYG handles overflow without requiring you to overprovision the reserved capacity. Work with your Azure account team to size this before committing—the PTU purchase is not trivially reversible.
The Architecture Isn’t Optional
The conversation about AI security tends to happen in one of two ways: before deployment, when there’s still time to build it correctly, or after an incident, when there isn’t. Private endpoints, Managed Identities, content filtering policies, APIM-based logging, CMK encryption—these aren’t features you add when the project matures. They’re the foundation the project runs on.
The organizations that get this right are the ones that treated enterprise AI governance as an infrastructure problem from day one, not a compliance checkbox they’d get to eventually. You now have the architecture to do that. The standup meeting can wait.