Introducing Azure Chaos Studio
Date published:
What is Azure Chaos Studio
Intentionally introduce faults to cause system components failure to improve resilience and availability. Compared to DevOps and SRE, Chaos Engineering helps obtain consistent reliability by hardening services
-
Improve system resilience to failure and outage.
-
Reduces downtime
-
Identify any “What if’s questions.
-
Using faulting injection
-
Bombing Production to make them more reliable
When to apply Chaos Engineering
Development Stage - Identify potential problems before going into production.
Production Stage - Test resilience and identify problems not found in staging.
Failure Analysis - Reproduce and diagnose problems that have already occurred.
According to the principalsofchaos, it should be used against a production environment (only).
Chaos Engineering Tools
- Azure, AWS and Kubernetes - Chaos Toolkit simulate various types of failures and faults in your Kubernetes clusters and Gremlin - Cross-platform
- App, infrastructure and networking - Chaos Blade
- Kubernetes - Chaos Mesh and Chaos Kube
- Kubeinvader - Production Testing
- Chaos Monkey - terminating instances in production. Later, built Chaos Kong kill region
Introducing Azure Chaos Studio
Currently in Public Preview, Azure Chaos Studio is a tool designed for testing and improving cloud workload performance. It is a fully managed service that helps you to identify and fix issues before they impact your customers. it is integrated with Azure DevOps and infrastructure as code such as Bicep and Terraform. Azure Choas Studio can be use to test your application in a production environment without impacting your customers and another great feature to use with it is Azure Load Testing. https://learn.microsoft.com/en- us/azure/architecture/guide/testing/mission-critical-deployment-testing
Getting Started with Azure Chaos Studio
Step 1: Create a Profile Defines the types of failures and faults that you want to inject into your application
Set 2 - Set Targets : Before we can start creating chaos, we need to onboard Resources to Chaos Studio.
Step 3: Create Experiments Define an experiment to test a specific aspect of your workload, such as network latency or server capacity.
Step 4: Execute & Analyse Execute the experiment, analyse the results, and use them to optimize your workload configuration.
How Does Azure Chaos Studio work
Service-direct faults run directly against an Azure resource. Anything that is not a VM, e.g. App Service and Cosmos DB, etc
Agent-based faults run in VMs or virtual machine scale sets to do in-guest failures. Windows or Linux VM e.g. VMSS for AKS
- A Collection of Steps Running sequentially
- Containing Branches Running parallel within a step
- Containing Actions Performing the fault injections à
- Targeting Selectors Your actual Azure Resources (Experiments = Azure Resource)
You need to have the appropriate permissions over the resource you want to target with the experiment. More details can be found here: https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-fault-providers
Using Choas Studio to detect pod failure
Here’s an example of how you can use Chaos Engineering with Chaos Studio to detect pod failure in Kubernetes:
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
name: pod-failure-example
namespace: chaos-testing
spec:
action: pod-failure
mode: all
duration: '600s'
selector:
namespaces:
- voting-app
Conclusion
In this article, we have looked at what is Azure Chaos Studio and how it can be used to test your application in a production environment without impacting your customers. We also looked at how to get started with Azure Chaos Studio and how it works. Finally, we looked at how to use Azure Chaos Studio to detect pod failure in Kubernetes.