Using Azure Virtual Network NAT, enhance outbound connectivity

Making outbound connections to the internet from their virtual networks is a basic necessity for many customers’ Azure solution architectures. When figuring out how outbound connectivity will function for a specific architecture, it’s crucial to take factors like security, resilience, and scalability into account. Virtual Network NAT, fortunately, is the ideal way to provide highly reliable and secure outbound connectivity to the internet. Virtual Network NAT, also known as NAT gateway, is a completely managed, highly robust solution that is simple to scale and intended to handle heavy, fluctuating workloads.

By being attached to a subnet and having a public IP address, NAT gateways offer outbound connectivity to the internet. Network address translation, or NAT, is the term for the process of translating all of a subnet’s private IP addresses—such as those of its virtual machines—to the public IP address of the NAT gateway when it is connected to the subnet. Then, for the resources of the subnet, the public IP address of the NAT gateway is used. A total of 16 public IP addresses from any combination of prefixes and public IP addresses may be connected to NAT gateways.

Figure 1: NAT gateway configuration with a subnet and a public IP address and prefix.

Customers need a dependable and scalable connection method to this data source if they work in sectors like finance or retail, for example, or in other situations where accessing massive data sets from one source is necessary.

In this blog post, we’ll go over one example of this type that had been made possible by NAT gateway.

Customer History

For one of their main workloads, a customer gathers a lot of data to track, examine, and ultimately make business decisions. This information is gathered online via a service provider’s REST APIs that are hosted in their own data centre. A recurrent report cannot be depended upon since the data sets the customer is interested in could vary daily—they must request the data sets each day. The amount of data required results in paginated and shared results. For this one workload, the customer needs submit tens of thousands of API requests every day, which generally take one to two hours. Similar to their prior on-premises system, each request corresponds to a unique independent HTTP connection.

Initial architectural design

In this case, the client uses their Azure virtual network to access to REST APIs in the service provider’s on-premises network. The on-premises network of the service provider is protected by a firewall. The client began to observe that occasionally one or more virtual machines would hang around for a very long time waiting for the REST API endpoint to respond. These connections would eventually run out while waiting for a response, breaking the connection.

Figure 2: The customer sends traffic from their virtual machine scale set (VMSS) in their Azure virtual network over the internet to an on-premises service provider’s data center server (REST API) that is fronted by a firewall.

The investigation

It was discovered through further investigation using packet captures that the service provider’s firewall was discreetly terminating incoming connections from their Azure network. This appeared strange because the customer’s architecture in Azure was created and scaled particularly to manage the number of connections flowing to the service provider’s REST APIs for obtaining the data they needed. What precisely was the problem, then?

Together, the customer, the service provider, and Microsoft support experts looked into why connections from the Azure network were occasionally being dropped, and they made an important finding. The firewall of the service provider only rejected connections coming from source ports and IP addresses that had not been in use for a significant amount of time (more than 20 seconds). This is due to the firewall of the service provider, which imposes a 20-second cooling off period on fresh connections originating from the same source IP and port. The firewall’s cooldown timer did not affect any connections on the same public IP utilising a new source port. The conclusion drawn from these observations was that the source network address translation (SNAT) ports from the customer’s Azure virtual network were being reused far too frequently to allow for the establishment of new connections to the service provider’s REST API. The connection would time out and eventually fail if ports were reused before the cooldown interval expired. When connecting to the service provider’s REST API, the customer was then asked, “How do we prevent ports from being overused too quickly?” The customer had to operate within the limitations of the firewall’s cooldown timer because it could not be altered.

NAT gateway to the rescue

Based on this information, NAT gateway was added as a proof of concept to the customer’s Azure configuration. Problems with connection timeouts were eliminated with this single update.

For two reasons, NAT gateway was able to help this customer with his problem with outbound connectivity to the service provider’s REST APIs. First, NAT gateways choose ports at random from a vast list of available ports. The source port chosen to establish a new connection has a high likelihood of being a new one and will thus likely pass through the firewall with no problems. The several ports that the NAT gateway has access to are derived from the public IPs that are connected to it. A NAT gateway can have up to 16 public IP addresses associated to it, and each one of them gives 64,512 SNAT ports to the resources of a subnet. This implies that a SNAT port can be made available to a subnet for outgoing connections by a customer in excess of 1 million times. Additionally, the firewall’s 20-second cooldown duration has no effect on source ports that the NAT gateway uses to connect to the service provider’s REST APIs. This is because the source ports must wait at least as long as the firewall’s cooldown timer before being put on their own cooldown timer by NAT gateway. For additional information, see our open article on NAT gateway SNAT port reuse timings.

Discover more

Through the aforementioned client situation, we discovered how the NAT gateway’s choice and reuse of SNAT ports demonstrates why Azure recommends it as a method for connecting outward to the internet. NAT gateway is ultimately the best choice for connecting outbound to the internet from your Azure network because it is able to reduce danger of SNAT port exhaustion as well as connection timeouts through its randomised port selection.

For more information, contact Professional Labs, the Best Cloud Managed Services Provider Oman
Contact Us | Professional labs (prolabsit.com)