Logo
Blog
Automating DR with PVC in AKS And Azure Fileshare

Automating Disaster Recovery with Persistent Volume Claim (PVC) Sync in Azure Kubernetes Service

javascript-statistic

Avatar
Umang Patel
December 12, 2024

Introduction

Disaster Recovery (DR) is an essential strategy in modern Kubernetes-based architectures to ensure business continuity and safeguard data integrity. This implementation demonstrates how to synchronize 100 Persistent Volume Claims (PVCs), distributed across namespaces in an Azure Kubernetes Service (AKS) cluster, from the primary region (UK South) to the secondary region (UK West). Leveraging Azure DevOps (ADO) pipelines, Azure Storage, and automation scripts, we achieve a scalable, secure, and efficient DR solution.

This guide explores the architecture, implementation, challenges encountered, and the strategies used to overcome them using standard DevOps and cloud-native practices.



Challenges and Solutions

1. Scaling DR for Multiple Namespaces

Challenge: Managing 100 clients, each deployed in separate namespaces, with unique PVCs and data requirements.

Solution:

  • Implemented a mapping file (pvc_mappings.yaml) to create a one-to-one relationship between source and target PVCs.
  • Ensured flexibility to add new namespaces and PVCs dynamically by updating the mappings.

2. Automating Regular Data Syncs

Challenge: Ensuring consistent replication of PVC data to the secondary region with minimal latency and overhead.

Solution:

  • Designed a cron-based Azure DevOps pipeline that triggers the sync process every 15 minutes, minimizing Recovery Point Objective (RPO).
  • Used Azure Blob Storage with Read Access Geo-Redundant Zone-Redundant Storage (RAGZRS) for backend durability and performance.

3. Ensuring Secure Data Transfers

Challenge: Protecting sensitive data during replication across regions.

Solution:

  • Retrieved Shared Access Signature (SAS) tokens securely from Azure Key Vault within the pipeline.
  • Scoped SAS tokens with least privilege, ensuring secure access for specific operations.

4. Reliable and Scalable Sync Mechanism

Challenge: Avoiding data discrepancies, incomplete transfers, or failures during sync.

Solution:

  • Utilized azcopy, a high-performance data transfer tool, to synchronize blob data efficiently.
  • Incorporated robust error handling and retries in the sync script.

Architecture and Workflow

1. High-Level Architecture

The DR setup involves:

  • Primary Region (UK South): Hosts the production AKS cluster with PVCs backed by Azure Storage.
  • Secondary Region (UK West): Hosts the DR AKS cluster with PVCs mapped to the secondary storage account.
  • Azure DevOps Pipeline: Automates data sync between primary and secondary PVCs.

2. Workflow

  1. Pipeline Triggers: The pipeline runs on:
    • Changes pushed to the main branch.
    • A cron schedule (every 15 minutes).
  2. SAS Token Retrieval: The pipeline fetches an SAS token from Azure Key Vault.
  3. PVC Mapping: Reads the PVC mapping file (pvc_mappings.yaml) for source and target PVCs.
  4. Data Sync: Executes azcopy sync commands to replicate data between storage accounts.
  5. Monitoring: Logs sync status and retries failed operations if necessary.

Implementation Details

1. Pipeline Configuration

The Azure DevOps pipeline (azure-pipelines.yml) automates the sync process with the following key sections:

  • Triggers and Scheduling: Defines branch-based triggers and a 15-minute sync interval using cron:

  • Secure SAS Token Retrieval: Uses the Azure Key Vault task to fetch the SAS token securely:

  • Sync Execution:
    Executes the sync-pvcs.sh script, passing the SAS token as an argument:

2. PVC Mapping File

The pvc_mappings.yaml serves as a configuration layer to map source PVCs (primary region) to target PVCs (secondary region).

Example YAML Structure:

Scalability: New mappings can be added without modifying the script logic.

3. Sync Script

The sync-pvcs.sh script orchestrates the data sync process:

Script Logic:

Validation and Monitoring

1. Testing the DR Setup

  • Deploy test PVCs with sample data in the primary cluster.
  • Manually trigger the pipeline and validate data replication in the secondary region.

2. Monitoring Tools

  • Pipeline Logs: Track sync job status and resolve failures.
  • Azure Monitor: Set up alerts for sync anomalies or storage health issues.

3. Data Consistency Checks

  • Use checksum or hash comparisons to validate data integrity between source and target PVCs.

Technical Highlights

Infrastructure as Code (IaC)

All configurations (pipeline, PVC mappings) are defined declaratively.

Cloud-Native Tools

  • Azure Storage RAGZRS: Robust geo-redundancy.
  • AzCopy: High-performance blob data transfers.

Security Best Practices

  • SAS token scoped with minimal permissions.
  • Tokens fetched securely using Azure Key Vault.

Conclusion

This solution demonstrates a scalable and robust DR mechanism for Kubernetes workloads using AKS and Azure DevOps. With automated PVC syncing, secure data handling, and regular monitoring, it achieves minimal downtime and data loss in the event of a regional outage. This architecture can be further extended for larger-scale Kubernetes deployments or integrated with other cloud services for enhanced DR capabilities.

bash
#!/bin/bash
sas_token=$1
mappings_file="pvc_mappings.yaml"
echo "Reading PVC mappings from $mappings_file..."
if [[ ! -f "$mappings_file" ]]; then
echo "ERROR: PVC mappings file not found!"
exit 1
fi
for mapping in $(cat $mappings_file | grep -v "^#" | awk '{print $1}'); do
source_pvc=$(echo $mapping | cut -d':' -f1)
target_pvc=$(echo $mapping | cut -d':' -f2)
echo "Syncing $source_pvc to $target_pvc..."
azcopy sync \
"https://primary.blob.core.windows.net/$source_pvc?$sas_token" \
"https://secondary.blob.core.windows.net/$target_pvc?$sas_token" \
--recursive
if [ $? -ne 0 ]; then
echo "ERROR: Sync failed for $source_pvc to $target_pvc"
fi
done

bash
yamlCopy codepvcMappings:pvc-11a239c8-3a14-434b-b9bb-e9f6945f01bf: pvc-517fae97-3c99-42a4-9c08-85d886e0bd5d

bash
yamlCopy code- script: |
chmod +x sync-pvcs.sh
./sync-pvcs.sh $(SASToken) displayName: "Run PVC Sync Script"

bash
yamlCopy code- task: AzureKeyVault@2inputs:azureSubscription: 'AzureServiceConnection'KeyVaultName: 'MyKeyVault'SecretsFilter: 'SASToken'RunAsPreJob: true

bash
yamlCopy codetrigger:branches:include:- mainschedules:- cron: "*/15 * * * *" # Run every 15 minutesdisplayName: "15-Min PVC Sync"branches:include:- mainalways: true

bash
# Trigger pipeline on push to 'main' branch or on schedule
trigger:
branches:
include:
- main
# Schedule to run the pipeline every hour (you can adjust this)
schedules:
- cron: "*/15 * * * *" # Sync every 15 minutes
displayName: "15-Min PVC Sync"
branches:
include:
- main
always: true
pool:
name: Livcast-prod # Use Ubuntu-based agent; modify if you want to use another
steps:
# Step 1: Fetch the SAS Token from Azure Key Vault
- task: AzureKeyVault@1
displayName: 'Fetch the token'
inputs:
azureSubscription: 'ADO-TF-SP-Livcast-PROD'
KeyVaultName: 'kv-Livcast-prod'
SecretsFilter: 'pvc-sas-token'
RunAsPreJob: true
# Step 2: Check if AzCopy is installed, else install it
- script: |
if ! command -v azcopy &> /dev/null
then
echo "AzCopy not found. Installing AzCopy..."
wget -O azcopy.tar https://aka.ms/downloadazcopy-v10-linux
tar -xf azcopy.tar --strip-components=1
sudo mv ./azcopy /usr/bin/azcopy
else
echo "AzCopy already installed."
fi
displayName: 'Ensure AzCopy is Installed'
# Step 3: Check if yq is installed, else install it
- script: |
if ! command -v yq &> /dev/null
then
echo "yq not found. Installing yq..."
sudo wget https://github.com/mikefarah/yq/releases/download/v4.9.8/yq_linux_amd64 -O /usr/bin/yq
sudo chmod +x /usr/bin/yq
else
echo "yq already installed."
fi
displayName: 'Ensure yq is Installed'
# Step 4: Convert line endings of sync-pvcs.sh
- script: |
echo "Converting line endings of sync-pvcs.sh..."
sed -i 's/\r$//' sync-pvcs.sh
displayName: 'Convert Line Endings to Unix'
- script: |
echo "Fetching SAS token from Key Vault..."
sas_token=$(echo $PVC_SAS_TOKEN) # This retrieves the SAS token from Key Vault
echo "Using SAS Token: $sas_token"
echo "Starting PVC sync process for 100 clients..."
set -e # Exit script on error
# Run your sync script, passing the SAS token as an argument
bash ./sync-pvcs.sh $sas_token
# Check the exit status and log the result
if [ $? -eq 0 ]; then
echo "PVC sync process completed successfully."
else
echo "ERROR: PVC sync process failed!" >&2
exit 1
fi
displayName: 'Sync PVC Data with SAS from Key Vault'
env:
PVC_SAS_TOKEN: $(pvc-sas-token) # Make sure the token is passed as an environment variable
continueOnError: false # Pipeline should fail on error

bash
pvcMappings:
pvc-11a239c8-3a14-434b-b9bb-e9f6945f01bf: pvc-517fae97-3c99-42a4-9c08-85d886e0bd5d
# Add the rest of your 100 PVC mappings

bash
#!/bin/bash
# Accept the SAS token as an argument
sas_token=$1
# Load the PVC mappings from YAML file
mappings_file="pvc_mappings.yaml"
# sas_token="?sv=2022-11-02&ss=f&srt=sco&sp=rwdlc&se=2024-10-31T14:31:21Z&st=2024-10-04T06:31:21Z&spr=https&sig=5b7OmzE4bE4ZTOaLmUjbdYEiBc6lqlTAtauRrk9WJ8s%3D"
echo "Reading PVC mappings from $mappings_file..."
# Check if the mappings file exists
if [[ ! -f "$mappings_file" ]]; then
echo "ERROR: PVC mappings file not found!"
exit 1
fi
# Read the PVC mappings from the YAML file under the 'pvcMappings' key
declare -A pvc_mappings
while IFS= read -r line; do
if [[ "$line" =~ ^[[:space:]]*pvc- ]]; then
primary_pvc=$(echo $line | cut -d ":" -f1 | xargs) # Get the primary PVC ID
secondary_pvc=$(echo $line | cut -d ":" -f2 | xargs) # Get the secondary PVC ID
pvc_mappings[$primary_pvc]=$secondary_pvc
fi
done < <(yq eval '.pvcMappings' "$mappings_file")
echo "Loaded PVC mappings:"
for primary_pvc in "${!pvc_mappings[@]}"; do
echo "$primary_pvc -> ${pvc_mappings[$primary_pvc]}"
done
# Debugging: Check if mappings are loaded
if [ ${#pvc_mappings[@]} -eq 0 ]; then
echo "No PVC mappings loaded! Please check the format of pvc_mappings.yaml."
exit 1
fi
# Loop through all primary PVCs and sync to their corresponding secondary PVCs
for primary_pvc in "${!pvc_mappings[@]}"; do
secondary_pvc=${pvc_mappings[$primary_pvc]}
echo "Syncing data from primary PVC ($primary_pvc) to secondary PVC ($secondary_pvc)..."
# Sync data from primary PVC (file share) to secondary PVC (file share) using the same SAS token
azcopy copy "https://primary.file.core.windows.net/$primary_pvc$sas_token" \
"https://primary.file.core.windows.net/$secondary_pvc$sas_token" \
--recursive \
--overwrite=true \
--force-if-read-only=true \
--log-level INFO \
--output-type text
if [ $? -eq 0 ]; then
echo "Sync successful for $primary_pvc to $secondary_pvc"
else
echo "ERROR: Sync failed for $primary_pvc to $secondary_pvc"
exit 1
fi
done
echo "PVC sync process completed."



Contact Us

Thank you for reading our comprehensive guide on "Building Private Azure Infrastructure with Terraform" We hope you found it insightful and valuable. If you have any questions, need further assistance, or are looking for expert support in setting up and managing your Azure infrastructure, our team is here to help!

Reach out to us for Your Azure Infrastructure Needs:

🌐 Website: https://blog.prometheanz.com

📧 Email: [email protected]

Happy Terraforming!


Copyright © 2024 PrometheanTech. All Rights Reserved.