Skip to main content

· One min read

A very brief Blog article today, I have created an AWESOME-Azure-Architecture list, this list is hosted in Github and located here:

This list is a curated list of AWESOME blogs, videos, tutorials, code, tools & scripts, related to the design and implementation of solutions in Microsoft Azure.

This list contains anything that can help with your Microsoft Azure architecture and quickly get you up and running when designing, planning, and implementing services that empower organisations around the planet to achieve more.

· 9 min read

You may be working remotely or only have a few devices needing access to your resources in Azure; a solution that can be deployed is a point to site connection straight into your Microsoft Azure network.

This functionality allows your computer to connect privately to resources over a secure tunnel using your internet connection, using an Azure Virtual Network gateway, you can seamlessly connect to resources without the need of opening up your resources to the internet or having to whitelist your (or third party vendor) IP address, which may change daily.

You know only your specified users access your Azure resources using Microsoft Entra ID.

You can have a site to site and point to site VPN running on the same Gateway today. We will set up a Point to Site VPN using Windows 11.

Azure Point to Site

Depending on the SKU of your Virtual Network Gateway, depends on the number of concurrent connections and throughput you are allowed; because we are using Microsoft Entra ID and the OpenVPN protocol, I will be selecting Generation 1, VpnGw1, supporting a max of 250 connections (you can double the number of throughput and connections if you are running in Active/Active and have a second gateway, or select a higher SKU).

Azure AD authentication is supported for OpenVPN® protocol connections only and requires the Azure VPN client.

A note about Gateway SKUs (apart from Basic) you can resize in the same generation (i.e. Generation 1 VpnGw1 to VpnGw3, but you can't go from Generation 1 VpnGw1 to Generation 2 VpnGw5, in order to upgrade, you have to delete and recreate the Gateway, just keep this in mind when deciding on the SKU of your resources).

You can read more about the Virtual Network Gateways and VPN SKUs at the official Microsoft documentation here; your Gateway SKU may differ depending on your requirements.

Create Azure Point to Site VPN using Microsoft Entra ID authentication

Prerequisites

  • An Azure subscription (that you have at least contributor rights to and the ability to create Users and Groups)
  • An endpoint device running Windows 10 or 11 that you can install the Azure VPN client onto

Create Virtual Network

First things first, let's create a Virtual Network.

  1. Log in to the Azure Portal
  2. Click on + Create a resource
  3. Search for: Virtual Network and click on it
  4. Click Create
  5. Select or create your Resource Group that you want your network resource to sit in (I recommend Virtual Network and the gateway resources sit in its own Resource Group away from other resources so that they can be protected by resource locks, RBAC and they are usually classified as a shared resource).
  6. Azure Virtual Network
  7. Click Next: IP Addresses
  8. Now we need to define the Address space and subnets; I will leave the Address space as 10.0.0.0/16 but remove the Default subnet (select the checkbox next to the Subnet and select Delete)
  9. Click +Add Subnet, and add a new subnet with the name of GatewaySubnet with an IP range of: 10.0.1.0/27 (this Subnet will be used by our Virtual Network Gateway, and the name needs to be exactly GatewaySubnet).
  10. Now I will add a subnet named: app servers, for the Virtual Machines I will need to connect to will be placed.
  11. Azure Virtual Network
  12. Click Next: Security
  13. Leave everything (BastionHost, DDoS Protection Standard, Firewall) as Disabled.
  14. Click Next: Tags
  15. Enter in any tags and click Review + Create
  16. Review your configuration and click Create

Create Virtual Network Gateway

Now that we have the foundation of our setup - an Azure Virtual Network, it is time to provision the Gateway itself; just a note before we continue, the Gateway can take 30-60 minutes to provision.

  1. Log in to the Azure Portal
  2. Click on + Create a resource
  3. Type in and search for: Virtual Network Gateway
  4. Click Create
  5. Type in the name of your Azure Virtual Network Gateway
  6. Select the region (it must be the same region as your virtual network)
  7. Select the Gateway Type: VPN
  8. The VPN type is: Route-based
  9. Select the SKU, in this example - I will be going with VpnGw1
  10. Select the Generation of the Virtual Network Gateway; I am going with: Generation 1
  11. Select the Virtual Network that you created earlier, and it will automatically find and assign the Gateway to the Subnet named: GatewaySubnet
  12. Select Standard public IP address SKU
  13. Select Public IP address and select: Create new
  14. Type in your public IP name
  15. Leave 'Enable active-active mode' and 'Configure BGP' as Disabled.
  16. Click Review + Create
  17. Azure Virtual Network Gateway
  18. Verify configuration is correct and clicks Create
  19. It can take up to 30-60 minutes for the Virtual Network Gateway to be created.

Setup Microsoft Entra ID authentication on the Virtual Network Gateway

Now that the Virtual Network has been created, we can now set up Microsoft Entra ID authentication.

Collect Microsoft Entra ID Tenant ID

First, we need to collect the Azure AD Tenancy ID

  1. Log in to the Azure Portal
  2. Click on Microsoft Entra ID
  3. In the Overview pane, copy the Tenant ID and save this for the next step.
Grant Azure VPN Client permisisons

Now we need to grant the Azure VPN application permissions.

  1. Log in to the Azure Portal

  2. Open a new window and type in and press Enter:

    https://login.microsoftonline.com/common/oauth2/authorize?client_id=41b23e61-6c1e-4545-b367-cd054e0ed4b4&response_type=code&redirect_uri=https://portal.azure.com&nonce=1234&prompt=admin_consent
  3. If you get an error about external identity, then replace /common/ with your tenant ID.

  4. Azure VPN Permissions

  5. Click Accept

  6. Navigate back to Microsoft Entra ID

  7. Select Enterprise Applications

  8. Select Azure VPN

  9. Copy the Application ID of the Azure VPN enterprise application (you will need both Application ID and tenant ID for the next steps)

  10. Azure VPN

Configure Point to Site Connection

Now its time to configure the Virtual Network Gateway

  1. Log in to the Azure Portal
  2. Navigate to the Virtual Network Gateway you created earlier
  3. Click on Point-to-site configuration
  4. Click Configure now
  5. Enter in your address pool (this is the address pool of the VPN clients, make sure this doesn't overlap with any other IP range you use, I will go with: 172.0.0.0/16)
  6. Make sure the Tunnel type is: OpenVPN (SSL)
  7. Select Microsoft Entra ID for the Authentication type
  8. For Tenant, ID enter in: https://login.microsoftonline.com/**TENANTID**/ and enter in your own Tenant ID.
  9. For the Audience (this is the users and groups that are assigned to the Enterprise Azure VPN application), put in the Application ID of the Azure VPN.
  10. For the Issuer, enter in: https://sts.windows.net/**TENANTID**/
  11. Azure Virtual Network Gateway
  12. Click Save
  13. It may take 1-5 minutes to save the configuration
Install and connect using the Azure VPN client

Now that the Point to Site VPN has been configured it's time to connect!

  1. Click on Download VPN client (if it is greyed out, then navigate to the Overview pane, then back to the Point-to-site configuration).
  2. Extract the zip file, you will need these files
  3. Download the Azure VPN Client to your computer.
  4. Azure VPN Client
  5. Once, downloaded click Open.
  6. Click the + sign (lower left)
  7. Click Import
  8. Navigate to the: azurevpnconfig.xml file that you downloaded earlier and click Open
  9. You can change the Connection Name to something more user friendly (you can also edit the file directly for when you look at pushing out this to multiple users, but make sure you have a backup of the file)
  10. Click Save
  11. Azure VPN Connection
  12. Click Connect
  13. Enter in your Microsoft Entra ID credentials (you may be prompted for MFA, depending on the rules - you can use Azure VPN application under conditional access)
  14. Azure VPN Connection
  15. You should now be connected to the Azure network through a point to site VPN!
  16. If I run 'ipconfig /all' on my device, I can see a PPP adapter that is connected and on the VPN address range created earlier: 172.0.0.2
  17. Azure Point to Site Connections
  18. If I navigate back to the Point-to-site connection in the Azure Portal, I can see, my connection has been allocated:
  19. Azure Point to Site Connections
  20. I can now use Remote Desktop to connect to a Virtual Machine, running in my AppServers Subnet, which I am running without the need of a Public IP or bastion/jump host:
  21. Azure Point to Site VPN

Note: I don't have a DNS service running in Azure, but the Azure VPN agent will take DNS from the Virtual Network if you have this configured to point towards a DNS server (Active Directory, or other DNS forwarder (pointing towards Azure DNS IP: 168.63.129.16) such as Azure Firewall DNS proxy; you can set Custom DNS servers by modifying your DNS configuration, or add entries into the host file of the computers.

You can set your Custom DNS settings (remember to add the DNS suffix if needed) and configure the VPN to automatically connect by following the details on the OpenVPN Azure AD Client page.

Using Intune, you can also push this configuration to your Windows 10 and 11 clients

· 2 min read

Pretty simple article today regarding 'My website setup'.

I've had a few people ask what CMS (Content Management System) my website runs on - and no it's not running on an Azure App Service!

I am using:

  • Github Pages (running Jekyll and Ruby on Rails)
  • Cloudflare as my DNS CDN (which also allows me to set HTTPS) and cache the website across the planet

Because the pages are in a git repository, I have version control across my pages, can roll back or make any changes easily and allow others to submit pull requests for changes, or issues natively.

The pages are created using Markdown, I usually have a OneNote page with an idea or blurb, then Forestry to do the initial post, and then manually edit the files and verify the syntax is correct, add tables into the page and fix any issues that may have been caused (Forestry doesn't support markdown tables and can make some content look a bit weird and unstructured, but its usually an easy fix editing the markdown manually).

Having it on Github pages, helped me learn a lot more about using git and source control, versioning methodologies.

Then for comments, I use Disqus and for analytics, Google Analytics and Bing Webmaster Tools.

All in all - I just have to pay for the domain, everything else is free and because it's stateless, caching content is a lot easier and I don't have to worry about keeping a CMS up to date/patched or a database tuned!

If you're wondering why it's not running on an Azure App Service? I wanted something cheap, could further challenge and learn from, at the end of the day I wanted a stateless website (static websites in Storage account, wasn't available when I set this up) and I wanted to reserve my limited Azure credits to be able to actually learn and play more. I have no regrets in putting it in Github Pages and depending on your requirements - recommend you try it out!

· 13 min read

The Microsoft Azure ecosystem offers a lot of capabilities that empower individuals and businesses; one of those capabilities that are often overlooked is DNS(Domain Name System).

Azure DNS allows you to host your DNS domain in Azure, so you can manage your DNS records using the same credentials, billing, and support contract as your other Azure services. Zones can be either public or private, where Private DNS Zones (in Managed Preview) are only visible to VMs that are in your virtual network.

You can configure Azure DNS to resolve hostnames in your public domain. For example, if you purchased the contoso.xyz domain name from a domain name registrar, you can configure Azure DNS to host the contoso.xyz domain and resolve www.contoso.xyz to the IP address of your web server or web app.

In this article, we are going to focus on Azure Public DNS.

I had my external DNS under source control using Terraform and the Cloudflare provider a few years ago. I wanted to see if I use source control and continuous integration to do the same thing using Azure DNS and Azure Bicep.

My theory was I could make a change to a file and then commit it and have the Azure DNS records created or modified automatically, allowing changes to DNS to be gated, approved, scheduled and audited, allowing changes and rollback a lot easier – without having to give people access to be able to create DNS records with no auditability, turns out you can!

Using an Azure DevOps pipeline and repository and Azure Bicep, we will deploy an Azure Public DNS zone to a resource group automatically on a successful commit and any records.

Azure Bicep - Pipeline High Level

Create Azure Public DNS as Code

Prerequisites

  • An Azure DevOps account and permissions to create a service endpoint
  • An Azure subscription that you have at least contributor rights to
  • A git repository (I am going to use the repository in Azure DevOps, but you could use a nested repository from GitHub)
  • The latest Azure PowerShell modules and Azure Bicep/Azure CLI for local editing
  • A domain name and rights to change the nameservers to point towards Azure DNS

In this article, I will be using an Azure subscription. I have access to an Azure DevOps (free) subscription and a custom domain I joined named 'badasscloud.com'.

I will assume that you have nothing set up but feel free to skip the sections that aren't relevant.

That that we have the prerequisites sorted let's set it up...

Create Azure DevOps Repository

  1. Sign in to Azure DevOps

  2. Select + New Project

  3. Give your project a name (i.e., I am going with: DNSAsCode)

  4. Azure DevOps - Create New Project

  5. Click Create (your project will now be created)

  6. Click on Repos

  7. Click on Files

  8. Find the 'Initialize Main branch with a README or gitignore' section and click Initialize.

  9. Azure DevOps - Create New Project

  10. You should now have an empty git repository!

    Create Azure DevOps Service Connection

    For Azure DevOps to connect to Microsoft Azure, we need to set up a service principal; you can create the service connection in Azure DevOps. However, it usually generates a service principal with a name that could be unrecognizable in the future in Azure, and I prefer to develop them according to naming convention and something that I can look at and instantly recognize its use-case. To do that, we will create it using Azure CLI.

    1. Open PowerShell

    2. Run the following commands to connect to Azure and create your Service Principal with Contributor access to Azure:

      #Connects to Microsoft Azure
      az.cmd login
      #Set SPN name
      $AppRegName = 'SPN.AzureSubscription.Contributor'
      #Creates SPN and sets SPN as Contributor to the subscription
      $spn = az.cmd ad sp create-for-rbac --name $AppRegName --role 'contributor'
      #Exports Password, Tenant & App ID for better readability - required for Azure DevOps setup
      $spn | ConvertFrom-Json | Select-Object -Property password, tenant, appId
      az.cmd account show --query id --output tsv
      az.cmd account show --query name --output tsv
    3. Make sure you record the password, application ID and the subscription ID/name; you will need this for the next step - you won't be able to view it anywhere else; if you lose it, you can rerun the sp create command to generate a new password. Now that we have the SPN, we need to add the details into Azure DevOps.

    4. Sign in to Azure DevOps

    5. Navigate to the DNS As Code project you created earlier

    6. Click on Project Settings (bottom right-hand side of the window)

    7. Click on Service connections

    8. Click on: Create a service connection

    9. Select Azure Resource Manager

    10. Click Next

    11. Click on: Service Principal (Manual) and click Next

    12. Enter in the following details that we exported earlier from the creation of the service principal:

      • Subscription ID
      • Subscription Name
      • Service Principal ID (the appId)
      • Service principal key (password)
      • Tenant ID
    13. Click Verify to verify that Azure DevOps can connect to Azure; you should hopefully see a Verification succeeded.

    14. Give the Service connection a name (this is the display name that is visual in Azure DevOps)

    15. Add a description (i.e. created by, created on, created for)

    16. Click on Verify and save

    17. You now have a new Service connection!

    18. Azure DevOps - Service Connection

Note: The password for the service principal is valid for one year, so when they expire, you can come into the Azure DevOps service connection and update it here.

Add Azure Bicep to Repository

Now that Azure DevOps has the delegated rights to create resources in Microsoft Azure, we need to add the Azure Bicep for Azure DNS Zone.

I have created the below Azure Bicep file named: Deploy-PublicDNS.bicep

Don't edit the file yet. You can add your DNS records later - after we add some variables into the Azure Pipeline.

This file will:

  • Create a new public Azure DNS zone, if it doesn't exist
  • Add/Remove and modify any records

I have added CNAME, A Record and TXT Records as a base.

Deploy-PublicDNS.bicep
///Variables - Edit, these variables can be set in the script or implemented as part of Azure DevOps variables.
//Set the Domain Name Zone:
param PrimaryDNSZone string = ''
//Deploys to the location of your resource group, that is specified during the deployment.
var location = 'Global'
//Variable array for your A records. Add, remove and amend as needed, any new record needs to be included in {}.
var arecords = [
{
name: '@'
ipv4Address: '8.8.8.8'
}
{
name: 'webmail'
ipv4Address: '8.8.8.8'
}
]
//Variable array for your CNAME records. Add, remove and amend as needed, any new record needs to be included in {}.
var cnamerecords = [
{
name: 'blog'
value: 'luke.geek.nz'
}
]

//

var txtrecords = [
{
name: '@'
value: 'v=spf1 include:spf.protection.outlook.com -all'
}

]

///Deploys your infrastructure below.

//Deploys your DNS Zone.

resource DNSZone 'Microsoft.Network/dnsZones@2018-05-01' = {
name: toLower(PrimaryDNSZone)
location: location
properties: {
zoneType: 'Public'
}
}

//Deploys your A records that are listed in the arecord variable table above.

resource DNSARecords 'Microsoft.Network/dnsZones/A@2018-05-01' = [for arecord in arecords: {
name: toLower(arecord.name)
parent: DNSZone
properties: {
TTL: 3600
ARecords: [
{
ipv4Address: arecord.ipv4Address
}

]
targetResource: {}
}
}]

//Deploys your CNAME records that are listed in the cnamerecord variable table above.

resource CNAMErecords 'Microsoft.Network/dnsZones/CNAME@2018-05-01' = [for cnamerecord in cnamerecords: {
name: toLower(cnamerecord.name)
parent: DNSZone

properties: {
'TTL': 3600
CNAMERecord: {

cname: cnamerecord.value

}
targetResource: {}
}
}]

resource TXTrecords 'Microsoft.Network/dnsZones/TXT@2018-05-01' = [for txtrecord in txtrecords: {
name: toLower(txtrecord.name)
parent: DNSZone

properties: {
'TTL': 3600
TXTRecords: [
{
value: [
txtrecord.value
]
}

]
}


}]


output cnamerecords string = CNAMErecords[0].properties.CNAMERecord.cname
output arecords string = arecords[0].ipv4Address

To add the Azure Bicep file into Azure DevOps, you can commit it into the git repository; see a previous post on 'Git using Github Desktop on Windows for SysAdmins' to help get started. However, at this stage, I will create it manually in the portal.

  1. Sign in to Azure DevOps
  2. Navigate to the DNS As Code project you created earlier
  3. Click on Repos
  4. Click on Files
  5. Click on the Ellipsis on the right-hand side
  6. Click New
  7. Click File
  8. Azure DevOps - New File
  9. Type in the name of your file (including the bicep extension), i.e. Deploy-PublicDNS.bicep
  10. Click Create
  11. Copy the contents of the Azure Bicep file supplied above and paste them into the Contents of Deploy-PublicDNS.bicep in Azure DevOps
  12. Azure DevOps - Azure Bicep
  13. Click Commit
  14. Click Commit again
  15. While we are here, let's delete the README.md file (as it will cause issues with the pipeline later on), click on the README.md file.
  16. Click on the Ellipsis on the right-hand side
  17. Click Delete
  18. Click Commit
  19. You should now only have your: Deploy-PublicDNS.bicep in the repository.

Create Azure DevOps Pipeline

Now that we have the initial Azure Bicep file, it's time to create our pipeline that will do the heavy lifting. I have created the base pipeline that you can download, and we will import it into Azure DevOps.

azure-pipelines.yml
# Variable 'location' was defined in the Variables tab
# Variable 'PrimaryDNSZone' was defined in the Variables tab
# Variable 'ResourceGroupName' was defined in the Variables tab
# Variable 'SPN' is defined in the Variables tab
trigger:
branches:
include:
- refs/heads/main
jobs:
- job: Job_1
displayName: Agent job 1
pool:
vmImage: ubuntu-latest
steps:
- checkout: self
- task: AzureCLI@2
displayName: 'Azure CLI '
inputs:
connectedServiceNameARM: $(SPN)
scriptType: pscore
scriptLocation: inlineScript
inlineScript: >2-
az group create --name $(ResourceGroupName) --location $(location)
az deployment group create `
--template-file $(Build.SourcesDirectory)\Deploy-PublicDNS.bicep `
--resource-group $(ResourceGroupName) `
--parameters PrimaryDNSZone=$(PrimaryDNSZone)
powerShellErrorActionPreference: continue

This pipeline will run through the following steps:

  • Spin up an Azure-hosted agent running Ubuntu (it already has the Azure CLI and PowerShell setup)
  • Create the Azure resource group to place your DNS zone into (if it doesn't already exist)
  • Finally, do the actual Azure Bicep deployment and create your Primary DNS zone resource, and, if necessary, modify any resources.

Copy the contents of the YAML pipeline above, and let's import it to Azure DevOps.

  1. Sign in to Azure DevOps
  2. Navigate to the DNS As Code project you created earlier
  3. Click on Pipelines
  4. Click on the Create Pipeline
  5. Select Azure Repos Git (YAML)
  6. Select your DNSAsCode repository
  7. Select Starter pipeline
  8. Overwrite the contents of the starter pipeline with the YAML file supplied
  9. Azure DevOps - YAML
  10. Click on the arrow next to Save and Run and select Save
  11. Select Commit directly to the main branch
  12. Click Save
  13. You may get an error about the trigger. You can ignore it - we will need to set the variables and trigger now.
  14. Click on Pipelines, select your newly created pipeline
  15. Select Edit
  16. Click Variables
  17. Click on New Variable
  18. We need to add four variables. To make the deployment more environment-specific, add the following variables into Azure DevOps (these variables will be accessible by this pipeline only).
VariableNote
locationLocation where you want to deploy the Resource into – i.e. ‘Australia East’
PrimaryDNSZoneThe name of your domain you want the public zone to be, i.e. badasscloud.com
ResourceGroupNameThe name of the Resource Group that the DNS Zone resource will be deployed into, i.e. DNS-PRD-RG
SPNThe name of the Service Connection, that we created earlier to connect Azure DevOps to Azure, i.e., SPN.AzureDNSCode
  1. Azure DevOps Variables

  2. Click Save

    Test & final approval of Azure DevOps Pipeline

    Now that the Azure Pipeline has been created and variables set, it's time to test, warning this will run an actual deployment to your Azure subscription!

    We will deploy a once-off to grant the pipeline access to the service principal created earlier and verify that it works.

  3. Sign in to Azure DevOps

  4. Navigate to the DNS As Code project you created earlier

  5. Click on Pipelines

  6. Click on your Pipeline

  7. Select Run pipeline

  8. Click Run

  9. Click on Agent job 1

  10. You will see a message: This pipeline needs permission to access a resource before this run can continue

  11. Click View

  12. Azure DevOps - badasscloud.com DNS deployment

  13. Click Permit

  14. Click Permit again, to authorise your SPN access to your pipeline for all future runs

  15. Your pipeline will be added to the queue and once an agent becomes available will start to run.

As seen below, there were no resources before my deployment and the Azure Pipeline agent kicked off and created the resources in the Azure portal.

Note: You can expand the Agent Job to see the steps of the job, I hid it as it revealed subscription ID information etc during the deployment.

Azure DevOps - badasscloud.com DNS deployment

Remember to update your nameserver records for your domain to point towards the nameserver entries in the Azure DNS zone resource, to use Azure DNS!

Edit the Bicep file

Now that you have successfully deployed your Azure Bicep file, you can go into the Azure Bicep and update the A, CNAME records to match your own environment - any new change to this repository will automatically trigger Continous Integration and deployment, you can override this behaviour by editing the Pipeline, clicking Edit Trigger and unselect 'Enable; continuous integration

Each variable (var object) (cnames, arecords) is enclosed in brackets, this array allows you to add multiple records, for example, if I wanted to add another name record, it would look like this:

//Variable array for your CNAME records. Add, remove and amend as needed, any new record needs to be included in {}.
var cnamerecords = [
{
name: 'blog'
value: 'luke.geek.nz'
}
{
name: 'fancierblog'
value: 'azure.com'
}
]

Simply add another object under the first, as long as it is included in the brackets, then upon deployment Azure Bicep will parse the variable array and for each record, create/modify the DNS records, you only ever need to edit the content in the variable without touching the actual resource deployment.

As records are added and removed over time, you will develop a commit history and with the power of Azure DevOps, can implement scheduling changes at certain times and approval!

Hopefully, this article helps you achieve Infrastructure as Code for your Azure DNS resource, the same concept can be applied for other resources using Azure Bicep as well.

· 13 min read

Chaos engineering has been around for a while; Netflix runs their own famous Chaos Monkey, supposedly running 24/7, taking down their resources and pushing them to the limit continuously; it almost sounds counter-intuitive – but it's not.

Chaos engineering is defined as “the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production” (Principles of Chaos Engineering, http://principlesofchaos.org/). In other words, it’s a software testing method focusing on finding evidence of problems before they are experienced by users.

Chaos engineering is a methodology that helps developers attain consistent reliability by hardening services against failures in production. Another way to think about chaos engineering is that it's about embracing the inherent chaos in complex systems and, through experimentation, growing confidence in your solution's ability to handle it.

A common way to introduce chaos is to deliberately inject faults that cause system components to fail. The goal is to observe, monitor, respond to, and improve your system's reliability under adverse circumstances. For example, taking dependencies offline (stopping API apps, shutting down VMs, etc.), restricting access (enabling firewall rules, changing connection strings, etc.), or forcing failover (database level, Front Door, etc.), is a good way to validate that the application is able to handle faults gracefully.

Introducing controlled Chaos tools such as Chaos Monkey and now – Azure Chaos Studio allows you to put pressure and, in some cases, take down your services to teach you how your services will react under strain and identity areas of improvement as resiliency and scalability to improve your systems.

Chaos

Azure Chaos Studio (currently in Preview and only supported in several regionsnow) is an enabler for 'controlled Chaos' in the Microsoft Azure ecosystem. Using that same tool that Microsoft uses to test and improve their services – you can as well!

Chaos Studio works by creating Experiments (i.e., Faults/Capabilities) that run against Targets (your resources, whether they are agent or service-based).

There are two types of methods you can use to target your resources:

  • Service-direct
  • Agent-based

Service-direct is tied into the Azure fabric and puts pressure on your resources from outside them (i.e., supported on most resources that don't need agent-based, PaaS resources, such as Network Security Groups). For example, a service-direct capability may be to add or remove a security rule from your network security group for faulty findings.

Agent-based relies on an agent installed; these are targeted at resources such as Virtual Machine and Virtual Machine scale sets; agent-based targets use a user-assigned managed identity to manage an agent on your virtual machines and wreak havoc by running capabilities such as stopping services and putting memory and disk pressure on your workloads.

Just a word of warning, before you proceed to allow Chaos to reign in your environment, make sure it is done out of hours or, better yet – against development or test resources, also make sure that any resources that support autoscaling are disabled – or you might suddenly find ten more instances of that resource you were running (unless of course you're testing that autoscaling is working)! 😊

In my test setup, I have the following already pre-created that I will be running my experiments against:

  • Virtual Machine Scale set (running Windows with two instances)
  • Single Virtual Machine (running Windows) to test shutdown against

The currently supported resource types of Azure Chaos Studio can be found 'here'.

Setup Azure Chaos Studio

Create Managed Identity

Because we will use Agent-based capabilities to generate our Faults, I needed to create a Managed Identity to give Chaos Studio the ability to wreak havoc on my resources!

  1. In the Azure Portal, search for Managed Identities
  2. Click on Create
  3. Select the subscrSubscriptionng the resources that you want to test against
  4. Select your Resource Group to place the managed identity in (I suggest creating a new Resource Group, as your Chaos experiments may have a different lifecycle than your resources, but it's just a preference, I will be placing mine in the Chaos Studio resource group so I can quickly delete it later).
  5. Select the RegionRegionur resources
  6. Type in a name (this will be the identity that you will see in logs running these experiments, so make sure its something you can identify with)
  7. Azure Portal - Create User Management Identity
  8. Click Next: Tags
  9. Make sure you enter appropriate tags to make sure that the resource can be identified and tracked, and click Review + Create
  10. Azure Portal Tags
  11. Verify that everything looks good and click Create to create your User Assigned Managed identity.

Create Application Insights

Now, it's time to create an Application Insights resource. Applications Insights is for the logs of the experiments to go into, so you can see the faults and their behaviours.

  1. In the Azure Portal, search for Application Insights
  2. Click on Create
  3. Select the Subscription the resources that you want to test against
  4. Select your Resource Group to place the Application Insights resource into (I suggest creating a new Resource Group, as your Chaos experiments may have a different lifecycle than your resources, but it's just a preference, I will be placing mine in the Chaos Studio resource group so I can easily delete it later).
  5. Select the Region the resources are in
  6. Type in a name
  7. Select your Log Analytics workspace you want to link Application Insights to (if you don't have a Log Analytics workspace, you can create one 'here').
  8. Azure Portal - Application Insights
  9. Click Tags
  10. Make sure you enter appropriate tags to make sure that the resource can be identified and tracked, and click Review + Create
  11. Verify that everything looks good and click Create to create your Application Insights.

Setup Chaos Studio Targets

It is now time to add the resources targets to Chaos Studio

  1. In the Azure Portal, search for Chaos Studio
  2. On the left band side Blade, select Targets
  3. Azure Chaos Studio
  4. As you can see, I have a Virtual Machine Scale Set and a front-end Network Security Group.
  5. Select the checkbox next to Name to select all the Resources
  6. Select Enable Targets
  7. Azure Chaos Studio
  8. Select Enable service-direct targets (All resources)
  9. Enabling the service-direct targets will then add the capabilities supported by Service-direct targets into Chaos Studio for you to use.
  10. Once completed, I will select the scale set and click Enable Target
  11. Then finally, Enable agent-based targets (VM, VMSS)
  12. This is where you link the user-managed identity, and Application Insights created earlier
  13. Select your Subscription
  14. Select your managed identity
  15. Select Enabled for Application Insights and select your Application Insights account. The instrumentation key should be selected manually.
  16. Azure Chaos Studio - Enable targets
  17. If your instrumentation key isn't filled in, you can find it on the Overview pane of the Application Insights resource.
  18. Click Review + Enable
  19. Review the resources you want to enable Chaos Studio to target and select Enable
  20. Finally, you should now be back at the Targets pane make sure you select Manage actions and make sure that all actions are ticked and click Save
  21. Azure Chaos Studio Capabilities

Configure and run Azure Chaos Studio

Action exclusions

There may be actions that you don't want to be run against specific resources; an example might be you don't want anyone to kill any processes on a Virtual Machine.

  1. In the Target pane of Chaos Studio, select Actions next to the resource
  2. Unselect the capability you don't want to run on that resource
  3. Select Save
  4. Azure Chaos Studio Actions

Configure Experiments

An experiment is a collection of capabilities to create faults, put pressure on your resources, and cause Chaos that will run against your target resources. These experiments are saved so you can run them multiple times and edit them later, although currently, you cannot reassign the same experiments to other resources.

Note: If you name an Experiment the same as another experiment, it will replace the older Experiment with your new one and retain the previous history.

  1. In the Azure Portal, search for Chaos Studio.
  2. On the left band side Blade, select Experiments
  3. Click + Create
  4. Select your Subscription
  5. Select your Resource Group to save the Experiment into
  6. Type in a name for your Experiment that makes sense; in this case, we will put some Memory pressure on the VM scale set.
  7. Select your Region
  8. Click Next: Experiment Designer
  9. Using Experiment Designer, you can design your Faults; you can have multiple capabilities hit a resource with expected delays, i.e., you can have Memory pressure on a VM for 10 minutes, then CPU pressure, then shutdown.
  10. We are going to select Add Action
  11. Then Add Fault
  12. I am going to select Physical Memory pressure
  13. Leave the duration to 10 minutes
  14. Because this will go against my VM scale set, I will add in the instances I want to target (if you aren't targeting a VM Scale set, you can leave this blank, you can find the instance ID by going to your VM Scale set click on Instances, click on the VM instance you want to target and you should see the Instance ID in the Overview pane)
  15. Azure Chaos Studio - Add fault
  16. Select Next: Target resources
  17. Select your resources (you will notice as this is an Agent-based capability, only agent supported resources are listed)
  18. Select Add
  19. I am then going to Add delay for 5 Minutes
  20. Then add an abrupt VM shutdown for 10 minutes (Chaos Studio will automatically restart the VM after the 10-minute duration).
  21. Azure Chaos Studio create experiment
  22. As you can see with the Branches (items that will run in parallel) and actions, you can have multiple faults running at once in parallel by using branches or one after the other sequentially.
  23. Now that we are ready with our faulty, we are going to click Review + Create
  24. Click Create

Note: I had an API error; after some investigation, I found it was having problems with the '?' in my experiment name, so I removed it and continued to create the Experiment.

Assign permissions for the Experiments

Now that the Experiment has been created, we need to give rights to the Managed User account created earlier (and/or the System managed identity that was created when the Experiment was created for service-direct experiments).

I will assign permissions to the Resource Group that the VM Scale set exists in, but you might be better off applying the rights to the individual resource for more granular control. You can see suggested roles to give resources: Supported resource types and role assignments for the Chaos Studio Microsoft page.

  1. In the Azure Portal, click on the Resource Group containing the resources you want to run the Experiment against
  2. Select Access control (IAM)
  3. Click + Add
  4. Click Add Role Assignment
  5. Click Reader
  6. Click Next
  7. Select Assign access to Managed identity
  8. Click on + Select Members
  9. Select the User assigned management identity
  10. Click Review and assign.
  11. Because the shutdown is a service-direct, go back and give the experiment system managed identity Virtual Machine Contributor rights, so it has access to shutdown the VM.

Run Experiments

Now that the Experiment has been created, it should appear as a resource in the resource group you selected earlier; if you open it, you can see the Experiment's History, Start, and Edit buttons.

  1. Click Start
  2. Azure Chaos studio - Run experiment
  3. Click Ok to start the Experiment (and place it into the queue)
  4. Click on Details to see the experiment progress (and any errors), and if it fails one part, it may move to the next step depending on the fault.
  5. Azure Chaos studio - Run experiment
  6. Azure Chaos studio should now run rampant and do best – cause Chaos!

This service is still currently in Preview. If you have any issues, take a look at the: Troubleshoot issues with Azure Chaos Studio.

Monitor and Auditing of Azure Chaos Studio

Now that Azure Chaos Studio is in use by your organization, you may want to know what auditing is available, along with reporting to Application Insights.

Azure Activity Log

When an Azure Chaos Studio experiment has touched a resource, there will be an audit trail in the Azure activity log of that resource; here, you can see that 'WhatMemory', which is the Name of my Chaos Experiment, has successfully powered off and on my VM.

Azure Activity Log - Azure Chaos Studio

Azure Alerts

It is easy to set up alerts when a Chaos experiment kicks off; to create an Azure, do the following.

  1. In the Azure Portal, click on Azure Monitor
  2. Click on Alerts
  3. Click + Create
  4. Select Alert Rule
  5. Click Create resource
  6. Filter your resource type to Chaos Experiments
  7. Filter your alert to Subscription and click Done
  8. Click Add Condition
  9. Select: Starts a Chaos Experiment
  10. Make sure that: *Event initiated by is set to (All services and users)
  11. Click Done
  12. Click Add Action Group
  13. If you have one, assign an action group (these are who and how the alerts will get to you). If you don't have one, click: + Create an action group.
  14. Specify a resource group to hold your action groups (usually a monitor or management resource group)
  15. Type the Action Group name
  16. Type the Action group Display name
  17. Click Next: Notifications
  18. Select Notification Type
  19. Select email
  20. Select Email
  21. Type in your email address to be notified
  22. Click ok
  23. Type in the Name of the mail to be a reference in the future (i.e. Help Desk)
  24. Click Review + Create
  25. Click Create to create your Action group
  26. Type in your rule name (i.e. Alert – Chaos Experiment – Started)
  27. Type in a description
  28. Specify the resource group to place the alert in (again, usually a monitor or management resource group)
  29. Check Enable alert rule on creation
  30. Click Create alert rule

Note: Activity Log alerts are hidden types; they are not shown in the resource group by default, but if you check the: Show hidden types box, they will appear.

Azure Activity Log - Azure Chaos Studio