Skip to content

Programmatically Downloading and Storing Azure Billing Data:

Introduction

Automating the retrieval and storage of Azure billing data is essential for organizations seeking cost transparency and operational efficiency. This guide details a robust, production-grade approach to programmatically obtaining Azure billing data using Python, authenticating securely with certificates, and efficiently storing the results in a SQL Server database.


1. Secure Authentication: Acquiring an Azure Access Token with Certificates

The first step is to authenticate with Azure Active Directory (Azure AD) using certificate-based authentication. This is more secure than using client secrets and is recommended for automation and service-to-service scenarios. For a deep dive into certificate-based authentication setup, see the dedicated article: Certificate Based Authorization for Azure AD.

Python Example:

from msal import ConfidentialClientApplication

def get_access_token(client_id, authority, tenant_id, resource_scopes, cert_thumbprint, cert_key_path):
    """
    Acquire an Azure AD access token using certificate-based authentication.
    """
    with open(cert_key_path, "r") as key_file:
        private_key = key_file.read()
    app = ConfidentialClientApplication(
        client_id=client_id,
        authority=f"{authority}{tenant_id}",
        client_credential={
            "thumbprint": cert_thumbprint,
            "private_key": private_key,
        },
    )
    result = app.acquire_token_for_client(scopes=resource_scopes)
    if "access_token" not in result:
        raise Exception(f"Token acquisition failed: {result}")
    return result["access_token"]
  • Why certificates? They are more secure, support longer lifecycles, and are recommended for automation.
  • MSAL Library: The Microsoft Authentication Library (MSAL) is used for token acquisition, providing flexibility and support for advanced scenarios.

2. Generating the Azure Cost Report via REST API

Once authenticated, you can use the Azure Cost Management API to request a cost details report for your subscription. This involves making a POST request to the appropriate endpoint and polling until the report is ready.

Python Example:

import requests
import time
import json

def generate_azure_cost_report(subscription_id, access_token, start_date, end_date, api_version="2022-05-01"):
    url = f"https://management.azure.com/subscriptions/{subscription_id}/providers/Microsoft.CostManagement/generateCostDetailsReport?api-version={api_version}"
    payload = json.dumps({"metric": "ActualCost", "timePeriod": {"start": start_date, "end": end_date}})
    headers = {'Authorization': f'Bearer {access_token}', 'Content-Type': 'application/json'}
    response = requests.post(url, headers=headers, data=payload)
    # Poll until the report is ready
    while response.status_code == 202:
        location_url = response.headers.get('Location')
        retry_after = int(response.headers.get('Retry-After', 30))
        time.sleep(retry_after)
        response = requests.get(url=location_url, headers=headers)
    if response.status_code != 200:
        raise Exception(f"Failed to generate cost report: {response.status_code} - {response.text}")
    return response.json()
  • Polling: The API may return a 202 status, indicating the report is being generated. Poll the Location header until a 200 response is received.
  • Error Handling: Always check for non-200 responses and handle errors appropriately.

3. Downloading the Cost Report Data

The response from the cost report API includes a manifest with one or more blob URLs. Download these blobs to obtain the actual cost data, typically in CSV format.

Python Example:

import urllib3

def download_cost_report_blobs(manifest, output_path):
    http = urllib3.PoolManager()
    for blob in manifest['blobs']:
        blob_url = blob['blobLink']
        with open(output_path, 'wb') as out_file:
            blob_response = http.request('GET', blob_url, preload_content=False)
            out_file.write(blob_response.data)
  • Blob Download: Use a robust HTTP client (e.g., urllib3) to download the report data.
  • Output: Save the CSV file to a secure, accessible location for further processing.

4. Loading the Cost Data into SQL Server Efficiently

After downloading the cost report, the next step is to load the data into a SQL Server table. For large datasets, use a fast, batch insert method to optimize performance.

Python Example:

import pyodbc
import csv

def load_csv_to_sql_server(csv_path, connection_string, table_name):
    conn = pyodbc.connect(connection_string)
    cursor = conn.cursor()
    with open(csv_path, 'r', encoding='utf-8-sig') as csvfile:
        reader = csv.reader(csvfile)
        columns = next(reader)  # Header row
        insert_query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({', '.join(['?' for _ in columns])})"
        data = list(reader)
        cursor.fast_executemany = True
        cursor.executemany(insert_query, data)
        conn.commit()
    cursor.close()
    conn.close()
  • Fast Insert: The fast_executemany flag in pyodbc enables high-performance bulk inserts.
  • Schema Alignment: Ensure the CSV columns match the SQL table schema.

5. Orchestrating the End-to-End Process

A typical workflow to automate Azure billing data retrieval and storage:

def fetch_and_update_azure_billing_data():
    # Step 1: Get access token
    access_token = get_access_token(
        client_id=..., authority=..., tenant_id=..., resource_scopes=..., cert_thumbprint=..., cert_key_path=...
    )
    # Step 2: Generate cost report
    report = generate_azure_cost_report(
        subscription_id=..., access_token=access_token, start_date=..., end_date=...
    )
    # Step 3: Download report blob(s)
    download_cost_report_blobs(report['manifest'], output_path="azure_billing.csv")
    # Step 4: Load into SQL Server
    load_csv_to_sql_server(
        csv_path="azure_billing.csv", connection_string=..., table_name="AzureBilling"
    )

6. Additional Considerations

  • Permissions: The Azure AD application must have the required API permissions (e.g., Cost Management Reader) and access to the subscription.
  • Certificate Security: Store private keys securely and never commit them to source control.
  • Error Handling: Implement robust error handling and logging for production use.
  • Scheduling: Use a scheduler (e.g., cron, Azure Automation) to run the process regularly.

Conclusion

By following this approach, you can securely and efficiently automate the retrieval and storage of Azure billing data using Python. This enables advanced reporting, cost analysis, and integration with enterprise data platforms.


References