Introducing Azure Chaos Studio

Date published: September 24, 2023

What is Azure Chaos Studio

Intentionally introduce faults to cause system components failure to improve resilience and availability. Compared to DevOps and SRE, Chaos Engineering helps obtain consistent reliability by hardening services

Improve system resilience to failure and outage.
Reduces downtime
Identify any “What if’s questions.
Using faulting injection
Bombing Production to make them more reliable

When to apply Chaos Engineering

Development Stage - Identify potential problems before going into production.

Production Stage - Test resilience and identify problems not found in staging.

Failure Analysis - Reproduce and diagnose problems that have already occurred.

According to the principalsofchaos, it should be used against a production environment (only).

Chaos Engineering Tools

Azure, AWS and Kubernetes - Chaos Toolkit simulate various types of failures and faults in your Kubernetes clusters and Gremlin - Cross-platform
App, infrastructure and networking - Chaos Blade
Kubernetes - Chaos Mesh and Chaos Kube
Kubeinvader - Production Testing
Chaos Monkey - terminating instances in production. Later, built Chaos Kong kill region

Introducing Azure Chaos Studio

Currently in Public Preview, Azure Chaos Studio is a tool designed for testing and improving cloud workload performance. It is a fully managed service that helps you to identify and fix issues before they impact your customers. it is integrated with Azure DevOps and infrastructure as code such as Bicep and Terraform. Azure Choas Studio can be use to test your application in a production environment without impacting your customers and another great feature to use with it is Azure Load Testing. https://learn.microsoft.com/en- us/azure/architecture/guide/testing/mission-critical-deployment-testing

Getting Started with Azure Chaos Studio

Step 1: Create a Profile Defines the types of failures and faults that you want to inject into your application

Set 2 - Set Targets : Before we can start creating chaos, we need to onboard Resources to Chaos Studio.

Step 3: Create Experiments Define an experiment to test a specific aspect of your workload, such as network latency or server capacity.

Step 4: Execute & Analyse Execute the experiment, analyse the results, and use them to optimize your workload configuration.

How Does Azure Chaos Studio work

Service-direct faults run directly against an Azure resource. Anything that is not a VM, e.g. App Service and Cosmos DB, etc

Agent-based faults run in VMs or virtual machine scale sets to do in-guest failures. Windows or Linux VM e.g. VMSS for AKS

A Collection of Steps Running sequentially
Containing Branches Running parallel within a step
Containing Actions Performing the fault injections à
Targeting Selectors Your actual Azure Resources (Experiments = Azure Resource)

You need to have the appropriate permissions over the resource you want to target with the experiment. More details can be found here: https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-fault-providers

Using Choas Studio to detect pod failure

Here’s an example of how you can use Chaos Engineering with Chaos Studio to detect pod failure in Kubernetes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-failure-example
  namespace: chaos-testing
spec:
  action: pod-failure
  mode: all
  duration: '600s'
  selector:
    namespaces:
      - voting-app

Conclusion

In this article, we have looked at what is Azure Chaos Studio and how it can be used to test your application in a production environment without impacting your customers. We also looked at how to get started with Azure Chaos Studio and how it works. Finally, we looked at how to use Azure Chaos Studio to detect pod failure in Kubernetes.