Lead Engineer Patrick Robinson discusses networking and how to build a system that suits the elements and constraints of a business.
Addressing the network needs of any business can be a troublesome task at the best of times and one that not always gets the time and attention it needs. In this edited summary of a recent presentation, Lead Engineer (Developer Effectiveness) Patrick Robinson discusses Networking for Engineers In A Hurry and how to build a system that suits the elements and constraints of a business.
I’ve often been asked to provide advice on how to best set up a business network, especially for startups and scaleups like Envato, which have particular needs for network access that an off-the-shelf solution will not necessarily address.
To do that, we need to start by looking at some of the different network architectures that I have had experience with building and maintaining over my career, discussing the benefits and drawbacks of each. This is not an exhaustive list, and I urge all engineers to consider the needs of your specific system, environment, and organizational constraints.
A Local Area Network (LAN) is a high-speed, low-latency network connecting geographically concentrated devices. Comparatively, a Wide Area Network (WAN) is a lower speed, higher latency network connecting devices dispersed over a wider geographical area.
The most obvious examples are your home network and the internet, but many other types of LANs and WANs exist.
Traditionally large organizations have private Wide Area Networks to connect their many branch offices and data centers. Many bricks and mortar retail companies still need these to operate daily, but for many e-commerce companies, like Envato, there is no need for a traditional WAN.
Offices and university campuses are examples of more extensive local networks, often spanning multiple floors and even multiple buildings. During the early days of networking, when the internet was in its infancy, these networks were straightforward and flat; all devices were part of one giant network. But as we connected more and more devices to these networks, it became tough to manage and scale.
Engineers opted to create many virtual networks to manage these large networks. The simplest example would be a guest network you may have at your office for non-staff to join; while this is part of the same local area network, it’s segregated from the staff network.
Within the staff or privileged network, access is generally very permissive. Access between the guest or unprivileged network and the privileged network is tightly controlled, usually by a firewall. Similarly, traffic from the Internet would be tightly filtered, but traffic from the private wide area network might be less tightly controlled.
While virtual networks have their benefits, they don’t traditionally provide strong perimeter protection for external traffic. For a long time, companies have depended upon just Virtual Networks, with firewalls strategically placed, to provide firm security boundaries. In the industry, this is sometimes called the “M&M Security model” or “Armadillo network,” which implies that security is very hard on the outside perimeter. Still, when attackers breach the outer perimeter, they find the inside incredibly soft and vulnerable.
We used authentication to log in to your work cloud email server, but the security of devices within that privileged network was (and still is for many organizations) very poor. Security patching was lax, and the ability to escalate privileges was trivial.
Inter-application communication, such as to a database, was often not authenticated or missed stronger passwords.
A Virtual Private Cloud (VPC) is a Local Area Network within a Public Cloud provider (e.g., Amazon AWS, Google GCP, Microsoft Azure). The Virtual Networks within these VPCs are often called Subnets.
Typically Virtual Networks are classified as either public or private. Private networks function similarly to my home network, where I can make connections outbound, but inbound connections are denied.
Typically systems in private networks don’t have public IP addresses.
Public networks are directly internet-facing, and such systems need public IP addresses.
In networking, there are many different types of topologies we can utilize. The most common is the Hub-and-Spoke network. A router is in the middle, and all devices connect to this single device. The outer devices may also be routers that are their hubs with many other spokes (devices) connected. This approach is relatively cheap, with good reliability and great scalability, with the added benefit of recovering from failure quickly if required.
A mesh is another type of topology commonly used in wireless networks. The mesh is where nodes within a network form many connections. They can connect to some nodes (partial mesh) or all other nodes (fully connected mesh). Implementing a mesh network is expensive, and they are not that great at scaling along with your business. Still, they offer excellent reliability as they instantly recover from failure.
Now that we’ve discussed some basic concepts, simple designs, and a couple of network topologies, let’s explore what a typical corporate network design might look like.
This example hub-and-spoke network is relatively flat. Once inside the corporate network, access to all virtual networks is possible; however, the network suffers from the drawbacks we discussed under Malteser networks.
While this public hybrid network is similar to the previous design, it enables us to ‘bolt’ our public cloud onto our existing network as if it were another data center. This allowed companies to quickly adopt the cloud without significantly rearchitecting their current systems at a high cost.
As our cloud usage grows, though, we add more and more VPCs. As each VPC is its own local network, it quickly outgrows the capability of a traditional corporate WAN, as seen above. It’s costly, slow, and difficult to scale if we send traffic from one VPC to another over the corporate, private WAN.
For a long time, the only alternative was to connect our VPCs on Amazon using VPC Peering, where two VPCs are connected, allowing you to route traffic between those two VPCs. The drawback of this approach is each VPC must connect to every other VPC, as you cannot traverse one VPC to get to another.
This, unfortunately, requires a fully-connected mesh. As discussed previously, this, too, is not very scalable. The complexity and configuration required to grow linearly with the number of VPCs we have.
I’ve described traditional corporate network designs, their downsides, and how they apply to modern cloud environments. This design is still a real problem today and continues to provide challenges not just regarding security but also in our increasingly remote workforce. Forcing employees to join a corporate network through a VPN, usually in a single geographical location, has many drawbacks regarding usability, reliability, and performance.
In a Zero Trust Network, we do away with firewalls and virtual networks as a security boundary. Every application verifies the requests it receives, usually via a centralized certificate authority that issues certificates signed by a root certificate.
Unlike the M&M Security Model, we treat all networks as unprivileged. This provides much more robust security guarantees than typical network designs.
The concepts of Zero Trust are very involved and go beyond user-to-service and inter-service authentication and authorization.
Because Envato is a cloud-native, remote-first organization, Zero Trust Networking appealed to us significantly. It allowed us to treat the internet as our corporate WAN and do away with any congested Virtual Private Networks (VPNs) that our employees found challenging to use. It also reduced the overhead of building and maintaining a separate network for inter-service communication.
We treat all networks as untrusted and enforce the authentication of entities within every application.
Envato didn’t implement all these zero-trust concepts; we picked what suited us.
For instance, we never had a centralized certificate authority to issue service certificates for use in inter-service communications. Instead, developers manually configure the certificates on each service.
This approach has many drawbacks, making it tedious for us to rotate certificates and very hard to create ephemeral testing environments. Our applications are unnecessarily coupled with each other. Inter-service communication over the internet can also be hard to set up and debug and is more prone to failures.
We depend on Cloudflare to enable Zero Trust access for our remote team members by providing a centrally managed Identity Aware Proxy that integrates with our Identity Provider.
Since we embarked on our cloud journey more than six years ago, AWS networking has evolved significantly. Two services specifically–Transit Gateway and Cloud WAN–allow us to simplify our network architecture, improve reliability and reduce complexity. These services look more like the traditional network designs I explained earlier, where we build hub-and-spoke type networks to connect our VPCs and office. Reducing the number of VPCs we have and co-locating services that belong to the same Bounded Context will also be a big part of reducing that complexity.
Zero Trust still has a place in our architecture to provide strong security controls and enable our remote workforce. Still, we hope to provide a richer set of solutions that fit each of our use cases instead of the one size fits all approach we’ve applied.
As you can see, there are many factors to consider when building a network for a business and a large number of varied designs to work with or iterate from. Even for time-poor network managers, I hope this has provided a good starting point!
Want to learn more about the role of an engineer at Envato? Check out our careers page for current open positions in engineering.