These are challenging times, while many businesses are struggling to stay afloat others, as essential services, are seeing a huge surge in traffic. Many necessary websites are down and in need of upgrades to their technical architecture and others are requiring to amend services to be able to continue to offer them while practicing social distancing. What are the top 3 things I can do to manage this surge?
Tracking your Data
Internet traffic has increased these days. That brings not only opportunities for your online business but also potential performance issues. That's why usage performance and statistics collection is essential for web and mobile applications. Our team has extensive experience with NewRelic and Datadog. The client and server metrics are equally important as they may shed light on what your end customers are experiencing and correspondingly, how many concurrent users your system can handle.
Larger applications often consist of a few smaller components. There must be a development strategy for each: from design and implementation to tests and launch. In real life, the process may be different as new or updated components have to integrate with the rest of your system. We build, test and deploy systems with the help of GitLab CI.
Making it Scalable + Assessing your Cloud Options
With Internet traffic volume increasing, a single-server setup can only take your system so far, as there are always hardware limits. Generally speaking, a cluster of servers is more performant and reliable. However, it's not only a matter of deploying your application with Kubernetes on Google Cloud but software design challenges to circumvent. With the cloud-native approach in place, horizontal scaling becomes a rather pleasant journey.
Kate Matsudaira, author of Scalable Web Architecture and Distributed Systems, part of The Architecture of Open Source Applications, lists the following parameters for resilient and scalable systems:
Like most things in life, taking the time to plan ahead when building a web service can help in the long run; understanding some of the considerations and tradeoffs behind big websites can result in smarter decisions at the creation of smaller web sites. Below are some of the key principles that influence the design of large-scale web systems:
Availability: The uptime of a website is absolutely critical to the reputation and functionality of many companies. For some of the larger online retail sites, being unavailable for even minutes can result in thousands or millions of dollars in lost revenue, so designing their systems to be constantly available and resilient to failure is both a fundamental business and a technology requirement. High availability in distributed systems requires the careful consideration of redundancy for key components, rapid recovery in the event of partial system failures, and graceful degradation when problems occur.
Performance: Website performance has become an important consideration for most sites. The speed of a website affects usage and user satisfaction, as well as search engine rankings, a factor that directly correlates to revenue and retention. As a result, creating a system that is optimized for fast responses and low latency is key.
Reliability: A system needs to be reliable, such that a request for data will consistently return the same data. In the event the data changes or is updated, then that same request should return the new data. Users need to know that if something is written to the system, or stored, it will persist and can be relied on to be in place for future retrieval.
Scalability: When it comes to any large distributed system, size is just one aspect of scale that needs to be considered. Just as important is the effort required to increase capacity to handle greater amounts of load, commonly referred to as the scalability of the system. Scalability can refer to many different parameters of the system: how much additional traffic can it handle, how easy is it to add more storage capacity, or even how many more transactions can be processed.
Manageability: Designing a system that is easy to operate is another important consideration. The manageability of the system equates to the scalability of operations: maintenance and updates. Things to consider for manageability are the ease of diagnosing and understanding problems when they occur, ease of making updates or modifications, and how simple the system is to operate. (I.e., does it routinely operate without failure or exceptions?)
Cost: Cost is an important factor. This obviously can include hardware and software costs, but it is also important to consider other facets needed to deploy and maintain the system. The amount of developer time the system takes to build, the amount of operational effort required to run the system, and even the amount of training required should all be considered. Cost is the total cost of ownership."
Knowing when various components of complex applications and infrastructures are at risk of failing is super-critical, and for customer-facing applications/databases, it can be extremely damaging to a company’s image if they rely on users to report service degradation/outages.
Helpdesk software, linked to an organization’s customer relationship management (CRM) system helps track issues and assign the appropriate resolutions, but better than handling issues is preventing them. Root cause analysis and prospective fault anticipation is a far better way of maintaining system up-time AND customer/user satisfaction.
Products like Moogsoft’s AIOps reduce the noise of potential service disruptions and make sure that potential points of failure are recognized before the end users notice, allowing service providers to address problems before they become catastrophes.
Mike Silvey, EVP & founder of Moogsoft, says this about their service:
...our engagement is purely 'economic' - we solve a problem that is not addressed by the Monitoring vendors, the IT Operations vendors and the IT Service Management vendors...ultimately, the closer you get to agile, the end-users become the incident detection system, increasing the costs of business interruption, reducing developer productivity, the quality of customer experience and reducing agility. We sustainably detect incidents as they are occurring, reduce the cost of service interruption, reduce the number and cost of operations resources and enable ubiquitous monitoring - continuous assurance which enables continuous innovation and deployment.
Note that these architectural categories can be mutually exclusive; costs go up when more capital and operational expenditure happens to maintain availability and performance, as well as the management overheads for more complex infrastructures. Such challenges have occupied the efforts of technology architects for decades, and thankfully, offerings are available that cover these desirable functional characteristics.
Open source concepts like Kubernetes, an architectural approach to (relatively) easily scalable infrastructures, first developed at Google almost a decade ago, and increasingly more proprietary offerings like CloudFlare, designed specifically to be resistant to intentional attack, or Amazon Web Services (AWS), which provides the white-label infrastructure for a surprising amount of the cloud platforms many of us use on a daily basis. For deeper understanding of the management and scaling of any system, tools such as Moogsoft’s AIOps deliver mission-critical, automated information that help keep digital supply chains running.
In conclusion, being able to better serve your customers and fulfill the needs of your staff and vendors during this time may require technical upgrades. We know you need expert support, fast. During the COVID-19 Crisis, Pieoneers is offering 5 free 1-hour consultations per week to help companies prioritize their resources and make strategic decisions fast. Please reach out via our contact form and book a call or video conference with one of our experts today. Are you new at transitioning to online? Get in touch, we’d be happy to help with that as well.