Azure Storage + Document Intelligence + Function App + Cosmos DB
Costa Rica
Last updated: 2025-06-06
Important
This example is based on a public network site and is intended for demonstration purposes only. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about Important Considerations for Production Environment. Please note that these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly: Microsoft Sales and Support
List of References (Click to expand)
- Azure AI Document Intelligence documentation
- Get started with the Document Intelligence Sample Labeling tool
- Document Intelligence Sample Labeling tool
- Assign an Azure role for access to blob data
- Build and train a custom extraction model
- Compose custom models - Document Intelligence
- Deploy the Sample Labeling tool
- Train a custom model using the Sample Labeling tool
- Train models with the sample-labeling tool
- Azure Cosmos DB - Database for the AI Era
- Consistency levels in Azure Cosmos DB
- Azure Cosmos DB SQL API client library for Python
- CosmosClient class documentation
- Cosmos AAD Authentication
- Cosmos python examples
- Use control plane role-based access control with Azure Cosmos DB for NoSQL
- Use data plane role-based access control with Azure Cosmos DB for NoSQL
- Create or update Azure custom roles using Azure CLI
How to extract layout elements from PDFs stored in an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB for further analysis.
- Upload your PDFs to an Azure Blob Storage container.
- An Azure Function is triggered by the upload, which calls the Azure Document Intelligence Layout API to analyze the document structure.
- The extracted layout data (such as tables, checkboxes, and text) is parsed and subsequently stored in a Cosmos DB database, ensuring a seamless and automated workflow from document upload to data storage.
Note
Advantages of Document Intelligence for organizations handling with large volumes of documents:
- Utilizes natural language processing, computer vision, deep learning, and machine learning.
- Handles structured, semi-structured, and unstructured documents.
- Automates the extraction and transformation of layout data into usable formats like JSON or CSV.
Private Network Configuration
For enhanced security, consider configuring your Azure resources to operate within a private network. This can be achieved using Azure Virtual Network (VNet) to isolate your resources and control inbound and outbound traffic. Implementing private endpoints for services like Azure Blob Storage and Azure Functions can further secure your data by restricting access to your VNet.
Security
Ensure that you implement appropriate security measures when deploying this solution in a production environment. This includes:
- Securing Access: Use Azure Entra ID (formerly known as Azure Active Directory or Azure AD) for authentication and role-based access control (RBAC) to manage permissions.
- Managing Secrets: Store sensitive information such as connection strings and API keys in Azure Key Vault.
- Data Encryption: Enable encryption for data at rest and in transit to protect sensitive information.
Scalability
While this example provides a basic setup, you may need to scale the resources based on your specific requirements. Azure services offer various scaling options to handle increased workloads. Consider using:
- Auto-scaling: Configure auto-scaling for Azure Functions and other services to automatically adjust based on demand.
- Load Balancing: Use Azure Load Balancer or Application Gateway to distribute traffic and ensure high availability.
Cost Management
Monitor and manage the costs associated with your Azure resources. Use Azure Cost Management and Billing to track usage and optimize resource allocation.
Compliance
Ensure that your deployment complies with relevant regulations and standards. Use Azure Policy to enforce compliance and governance policies across your resources.
Disaster Recovery
Implement a disaster recovery plan to ensure business continuity in case of failures. Use Azure Site Recovery and backup solutions to protect your data and applications.
- An
Azure subscription is required. All other resources, including instructions for creating a Resource Group, are provided in this workshop. Contributor role assigned or any custom role that allows: access to manage all resources, and the ability to deploy resources within subscription.- If you choose to use the Terraform approach, please ensure that:
- Terraform is installed on your local machine.
- Install the Azure CLI to work with both Terraform and Azure commands.
- Please follow the Terraform guide to deploy the necessary Azure resources for the workshop.
- Next, as this method
skips the creation of each resourcemanually. Proceed with the configuration from Configure/Validate the Environment variables.
Important
Regarding Networking, this example will cover Public access configuration, and system-managed identity. However, please ensure you review your privacy requirements and adjust network and access settings as necessary for your specific case.
Using Cosmos DB provides you with a flexible, scalable, and globally distributed database solution that can handle both structured and semi-structured data efficiently.
Azure Blob Storage: Store the PDF invoices.Azure Functions: Trigger on new PDF uploads, extract data, and process it.Azure SQL Database or Cosmos DB: Store the extracted data for querying and analytics.
| Resource | Recommendation |
|---|---|
| Azure Blob Storage | Use for storing the PDF files. This keeps your file storage separate from your data storage, which is a common best practice. |
| Azure SQL Database | Use if your data is highly structured and you need complex queries and transactions. |
| Azure Cosmos DB | Use if you need a globally distributed database with low latency and the ability to handle semi-structured data. |
In the context of Azure Function Apps, a
hosting option refers to the plan you choose to run your function app. This choice affects how your function app is scaled, the resources available to each function app instance, and the support for advanced functionalities like virtual network connectivity and container support.
| Plan | Scale to Zero | Scale Behavior | Virtual Networking | Dedicated Compute & Reserved Cold Start | Max Scale Out (Instances) | Example AI Use Cases |
|---|---|---|---|---|---|---|
| Flex Consumption | Yes |
Fast event-driven |
Optional |
Optional (Always Ready) |
1000 |
Real-time data processing for AI models, high-traffic AI-powered APIs, event-driven AI microservices. Use for applications needing to process large volumes of data in real-time, such as AI models for fraud detection or real-time recommendation systems. Ideal for deploying APIs that serve AI models, such as natural language processing (NLP) or computer vision services, which require rapid scaling based on demand. |
| Consumption | Yes |
Event-driven |
Optional |
No |
200 |
Lightweight AI APIs, scheduled AI tasks, low-traffic AI event processing. Suitable for deploying lightweight AI services, such as sentiment analysis or simple image recognition, which do not require extensive resources. Perfect for running periodic AI tasks, like batch processing of data for machine learning model training or scheduled data analysis. |
| Functions Premium | No |
Event-driven with premium options |
Yes |
Yes |
100 |
Enterprise AI applications, AI services requiring VNet integration, low-latency AI APIs. Use for mission-critical AI applications that require high availability, low latency, and integration with virtual networks, such as AI-driven customer support systems or advanced analytics platforms. Ideal for AI services that need to securely connect to on-premises resources or other Azure services within a virtual network. |
| App Service | No |
Dedicated VMs |
Yes |
Yes |
Varies |
AI-powered web applications with integrated functions, AI applications needing dedicated resources. Great for web applications that incorporate AI functionalities, such as personalized content delivery, chatbots, or interactive AI features. Suitable for AI applications that require dedicated compute resources for consistent performance, such as intensive data processing or complex AI model inference. |
| Container Apps Env. | No |
Containerized microservices environment |
Yes |
Yes |
Varies |
AI microservices architecture, containerized AI workloads, complex AI event-driven workflows. Perfect for building a microservices architecture where each service can be independently scaled and managed, such as a suite of AI services for different tasks (e.g., image processing, text analysis). Ideal for deploying containerized AI workloads that need to run in a managed environment, such as machine learning model training and deployment pipelines. Suitable for orchestrating complex workflows involving multiple AI services and event-driven processes, such as automated data pipelines and real-time analytics. |
Note
This example is using system-assigned managed identity to assign RBACs (Role-based Access Control).
-
Under
Settings, go toEnvironment variables. And+ Addthe following variables:-
COSMOS_DB_ENDPOINT: Your Cosmos DB account endpoint 🡢Review the existence of this, if not create it -
COSMOS_DB_KEY: Your Cosmos DB account key 🡢Review the existence of this, if not create it -
COSMOS_DB_CONNECTION_STRING: Your Cosmos DB connection string 🡢Review the existence of this, if not create it -
invoicecontosostorage_STORAGE: Your Storage Account connection string 🡢Review the existence of this, if not create it -
FORM_RECOGNIZER_ENDPOINT: For example:https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/🡢Review the existence of this, if not create it -
FORM_RECOGNIZER_KEY: Your Documment Intelligence Key (Form Recognizer). 🡢 -
FUNCTIONS_EXTENSION_VERSION:~4🡢Review the existence of this, if not create it -
WEBSITE_RUN_FROM_PACKAGE:1🡢Review the existence of this, if not create it -
FUNCTIONS_WORKER_RUNTIME:python🡢Review the existence of this, if not create it -
FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR:true(This setting ensures that all entry point errors are visible in your application insights logs). 🡢Review the existence of this, if not create it
-
Click on
Applyto save your configuration.
-
-
You need to install VSCode
-
Install python from Microsoft store:
-
Open VSCode, and install some extensions:
python, andAzure Tools.
-
Click on the
Azureicon, andsign ininto your account. Allow the extensionAzure Resourcesto sign in using Microsoft, it will open a browser window. After doing so, you will be able to see your subscription and resources.
-
Under Workspace, click on
Create Function Project, and choose a path in your local computer to develop your function.
-
Choose the language, in this case is
python:
-
Select the model version, for this example let's use
v2:
-
For the python interpreter, let's use the one installed via
Microsoft Store:
-
Choose a template (e.g., Blob trigger) and configure it to trigger on new PDF uploads in your Blob container.
-
Provide a function name, like
BlobTriggerContosoPDFInvoicesDocIntelligence:
-
Next, it will prompt you for the path of the blob container where you expect the function to be triggered after a file is uploaded. In this case is
pdfinvoicesas was previously created.
-
Click on
Create new local app settings, and then choose your subscription.
-
Choose
Azure Storage Account for remote storage, and select one. I'll be using theinvoicecontosostorage.
-
Then click on
Open in the current window. You will see something like this:
-
Now we need to update the function code to extract data from PDFs and store it in Cosmos DB, use this an example:
- PDF Upload: A PDF is uploaded to the Azure Blob Storage container.
- Trigger Azure Function: The upload triggers the Azure Function
BlobTriggerContosoPDFInvoicesDocIntelligence. - Initialize Clients: Sets up connections to Document Intelligence and Cosmos DB.
- The function initializes the
DocumentAnalysisClientto interact with Azure Document Intelligence. - It also initializes the
CosmosClientto interact with Cosmos DB.
- The function initializes the
- Read PDF from Blob Storage: The function reads the PDF content from the Blob Storage into a byte stream.
- Analyze PDF: Uses Document Intelligence to extract data.
- The function calls the
begin_analyze_documentmethod of theDocumentAnalysisClientusing the prebuilt invoice model to analyze the PDF. - It waits for the analysis to complete and retrieves the results.
- The function calls the
- Extract Data: Structures the extracted data.
- The function extracts relevant fields from the analysis result, such as customer name, email, address, company name, phone, address, and rental details.
- It structures this extracted data into a dictionary (
invoice_data).
- The function extracts relevant fields from the analysis result, such as customer name, email, address, company name, phone, address, and rental details.
- Save Data to Cosmos DB: Inserts the data into Cosmos DB.
- The function calls
save_invoice_data_to_cosmosto save the structured data into Cosmos DB. - It ensures the database and container exist, then inserts the extracted data.
- The function calls
- Logging (process and errors): Throughout the process, the function logs various steps and any errors encountered for debugging and monitoring purposes.
-
Update the function_app.py, for example see the code used in this demo:
Template Blob Trigger Function Code updated 

-
Now, let's update the
requirements.txt, see the code used in this demo:Template requirements.txtUpdated requirements.txt

azure-functions azure-ai-formrecognizer azure-core azure-cosmos==4.3.0 azure-identity==1.7.0 -
Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
Important
If you need further assistance with the code, please click here to view all the function code.
Note
Please ensure that all specified roles are assigned to the Function App. The provided example used System assigned for the Function App to facilitate the role assignment.
Important
Please ensure that the user/system admin responsible for uploading the PDFs to the blob container has the necessary permissions. The error below illustrates what might occur if these roles are missing.
In that case, go to Access Control (IAM), click on + Add, and Add role assignment:
Search for Storage Blob Data Contributor, click Next.
Then, click on select members and search for your user/systen admin. Finally click on Review + assign.
Upload sample PDF invoices to the Blob container and verify that data is correctly ingested and stored in Cosmos DB.
-
Click on
Upload, then selectBrowse for filesand choose your PDF invoices to be stored in the blob container, which will trigger the function app to parse them.
-
Check the logs, and traces from your function with
Application Insights:
-
Under
Investigate, click onPerformance. Filter by time range, anddrill into the samples. Sort the results by date (if you have many, like in my case) and click on the last one.
-
Click on
View all:
-
Check all the logs, and traces generated. Also review the information parsed:
-
Validate that the information was uploaded to the Cosmos DB. Under
Data Explorer, check yourDatabase.




