Stressed: How performance remediation can help your systems and your business
Exploring barriers to performance and how to overcome them
Nov. 9, 2020 | By: James Pulley and Patrick Guindon-Slater
Amid a global pandemic, civil unrest and economic uncertainty—in 2020, it’s safe to say that everyone and everything is stressed. Here, we’ll just cover stressed technology systems.
Where and how people work has shifted. The majority of companies’ collaboration tools weren’t anticipating the load associated with a nearly or fully remote workforce. Anything once handled via impromptu face-to-face interactions has moved online. VoIP, VPN and security policies are adding to network complexity, which comes with additional performance cost.
More people are shopping, banking, applying for services and seeing the doctor virtually. Government agencies are facing significantly larger than normal application loads for small businesses, unemployment and other benefits. Their systems at every level are stressed, and these systemic failures have raised the visibility of performance and demonstrate the risk it puts on organizations’ reputations and ability to meet their missions.
The thing about performance is that when it works, no one notices. But when it doesn’t—it gets really, really noticed and talked about. Every day there are examples of site failures under load, like distanced education, election systems, spot sales, unemployment services—the list goes on and on. It’s always easier to prevent a newsworthy event than to respond to one. Yet, unfortunately, it’s hard to get leaders to take it seriously since it’s only a “possible” event and is never assumed to be likely. Let’s discuss a few ways to change the conversation.
The case for change
Inside your organization, there are applications that users don’t like to use because of poor performance—it’s the most common complaint across IT. Externally, economic uncertainty and a competitive market leave no room for error—failure to scale will cause a company to shut its doors.
Let’s take a look at an online e-commerce example: If you give out carts to customers upon arriving to your site, it locks up a set amount of resources for each user, slowing the site. Picture every shopper in a store lugging a large shopping cart—even if they’re just browsing, it locks up resources and slows things down. A few years ago, a major retailer was noticing significant performance issues, and by looking through logs and testing, identified a cart issue as the cause of the slowdown. Through performance remediation, they changed the architecture where shopping carts wouldn’t be handed out at arrival, ultimately resulting in site revenue going up over $10 million a month because of increased speed and customer conversion. In a competitive market, your customers will leave to make purchases elsewhere due to site performance.
You have to make the value messaging appropriate for your leadership. That means turning conversations about load, resources and CPU/memory into things that your executive team care deeply about: poor service, potential loss of revenue, reputational damage, employee frustration.
Barriers to performance
Performance is wrongly considered extra, a nonfunctional requirement. However, if you can’t access necessary functionality of the system—then the system isn’t going to meet business needs. Here are a few common barriers to performance we see and hear among customers.
Barrier #1: They don’t know how to evaluate performance.
Performance doesn’t mean a tool. Performance engineering finds ways to improve the efficiency of the system for it to scale better or respond faster to business demands. Organizations don’t always understand how to determine the value of performance until it is missing. The technology industry has failed in educating developers on the root cause of performance: how to use resources, how often and how large of blocks to grab, and how long to hold onto them. If developers need a larger resource pool, they default to getting bigger pools rather than using their pool they have most efficiently.
Barrier #2: Performance lives under the covers.
Unlike functionality you can see (like pushing a button when you see it), performance and security are under the cover—they’re hard to perceive. You have to design for them. Think of a Ferrari with the wrong engine. At first you don’t know, until you turn the key, hear the wrong noises and realize it doesn’t go fast—then you do. Performance is designing for under the cover.
Barrier #3: Business, marketing, sales and technology teams aren’t aligned on goals
Another retail example—once upon a Black Friday, a retailer’s marketing team had chosen a gorgeous and large—over 45 megabytes—image for its home page that ended up locking up so many resources on downloading that image that the site ran out of resources and failed to respond. Ultimately, this did not benefit the sales process. Make sure that the goal for your site or system is known and that all involved parties are aligned to that goal and what they need to do to achieve it.
Barrier #4: Perfection isn’t attainable
In general, there is an acceptable level of failure. If my site is up 99% of the time, that’s not bad. But, if you’re an online retailer and you fail on Cyber Monday, that’s not acceptable. The problem is that when you accept that there will be downtime or security breaches, and if you just contain it, you’re not really measuring the full impact in ways that it will impact your business.
How to begin performance remediation
To evaluate performance, you need a series of measurements or diagnostics for how long actions take and a record of what resources were used. Forensics looks back on an issue, identifying the root cause and then looking to how to remediate it so it doesn’t happen again. Capacity planning is looking ahead. Where you cannot pull data from a live environment, there is performance testing to generate the measurements of end-user response times and resource usage. Think of it like a dress rehearsal or perfect storm testing and designing for those situations in advance so you can actually handle it in case of that event.
Performance is about patterns. From a remediation perspective, you’re going to look for known patterns of behavior, evidence of where time is being spent and measurements of resource usage to explain why something is running long. Where and for how long are you holding resources for a user? Dig into the components that reveal the largest lock on resources.
Some issues can be reconciled with configuration or solved by having portions of user requests served by a specialized caching provider, like a content delivery network. This allows you to set policies to remove load associated with common elements (images, page components, style sheets) that are most frequently used as users make their path through the system and don’t change from one user to the next. This ultimately reduces resources used on your core system to the minimum required to get users through the system faster, and thus allow the resources to then be reused.
Without a content delivery network, every single asset has to be served from the data center, which increases stress on the network. A resource-heavy development model hurts scalability and requires a lot of memory on the servers. So, when you reach a high-load situation, you run out of resources very fast and can’t accept more users.
Prioritizing performance is a new way of working. But with a majority of your employees and customers connecting with your organization virtually—it’s an absolute priority. By leveraging performance forensics to identify root causes of issues and remediating in a timely manner, you’ll reap the ultimate in bottom-line outcomes: increased revenue, customer retention and employee satisfaction.
James Pulley is the practice manager of performance engineering and testing for TEKsystems. He has spent the last 20 years helping customers with software application performance and scalability as a performance tester and engineer.
Patrick Guindon-Slater is the practice manager for continuous testing for TEKsystems.