When failure is not an option
The arrival of HealthCare.gov—designed to assist millions of uninsured Americans—was highly anticipated. Only when the site went live on Oct. 1, 2013, it was over budget and unreliable, fraught with technical difficulties. Citizens had only from Nov. 1 to Jan. 31 to use the site compare insurance options and sign up for 2014.
Help was needed—and fast. The Centers for Medicare and Medicaid Services scrambled for answers from a variety of sources, including internal government experts, external programmers and industry partner, Software AG.
The solution: Software AG determined a lack of in-memory data caching was slowing website performance and recommended BigMemory, the core component of Software AG’s Terracotta platform.
The result: With Terracotta in place, HealthCare.gov performs like a workhorse. At the height of each enrollment season, the site expands into one of the world’s largest enterprise transaction-based sites, enabling millions to get the insurance they need.
“Now the site simply works. The government can assist 100s of 1,000s of citizens at the same time without a hitch.”
— Chris Steel | Chief Solutions Architect I Software AG Government Solutions
Just what the doctor ordered
The spectacular transformation of the HealthCare.gov website from its highly publicized launch in 2013 to a smoothly running system able to handle a massive number of transactions every second just a few months later probably seemed almost like a magic trick to most people looking in from the outside.
But no magic was involved—just a massive behind-the-scenes effort by government and private industry to update and expand the site’s hardware and storage capabilities. A key part of that effort was in-memory data management from Software AG, which helped to enhance all other improvements.
In 2016, the HealthCare.gov website ran smoothly without a glimmer of past problems. In fact, during peak usage when Americans were comparing and buying healthcare plans, HealthCare.gov processed 320,000 requests per minute and hosted up to 187,000 concurrent users. Only systems like credit card processing sites experience more volume.
High volume—no problem
Software AG Chief Architect Fabien Sanglier has worked with HealthCare.gov from the beginning. Sanglier said it was amazing to be part of a project that meant so much to so many people, as evidenced by the millions who came to the site the first few days it was online. While it was a great surprise to see so many visitors so quickly, the unexpectedly high number of visitors began to overload the site’s infrastructure.
Software AG’s Terracotta in-memory fabric is designed to help manage such high numbers. But it ran only in a test environment when HealthCare.gov was originally launched. HealthCare.gov did have some simple in-memory caching in place. Like nearly 90 percent of all business applications, it used Ehcache, a popular open-source Java® caching library.
For less-demanding applications, using Ehcache for low-latency access to local in-memory data usually does the trick and is often enough to keep transactions running smoothly. However, it couldn’t handle the demand HealthCare.gov was experiencing!
“If you are only using standalone local caching then you are duplicating efforts,” Sanglier said. “Server A and server B don’t know what each other are doing. The same request coming in to both servers is then both sent to the database for processing. You need something to tie it all together.”
For HealthCare.gov, the answer was Software AG’s Terracotta BigMemory software. Bringing Terracotta from the test environment onto the main system instantly made every disconnected application cache cluster aware. And it wasn’t a large effort. Implementing Terracotta or moving it from a test environment into a live setting, as was the case with HealthCare.gov, is a quick process—requiring the addition of just a few lines of code.
Expanding memory fabric
Terracotta enhanced what government teams were doing behind the scenes to expand the site’s hardware and storage platforms to accommodate more users. HealthCare.gov was able to rapidly expand its memory fabric platform horizontally and vertically by adding both more memory and more machines to the infrastructure.
That’s because Terracotta removes all restrictions on how much memory can be used on each machine. Adding a server with 300G of memory means that teams were truly adding a full 300G of memory storage to the site. Adding more machines and maxing out their memory capabilities expanded this even further. In fact, right after the launch, HealthCare.gov expanded from just two to 18
Terracotta nodes with each seamlessly supporting up to 165 JBoss® servers.
During open enrollment in 2016, HealthCare.gov expanded again to 44 Terracotta nodes, supporting 315 JBoss servers. And HealthCare.gov is more robust now too.
If any single Terracotta server goes off-line due to a hardware or network failure, or offline for maintenance, it doesn’t affect the overall system and no data is lost. The other Terracotta nodes simply compensate for the loss until the downed server comes back online.
This runtime data resiliency feature was demonstrated by conducting a full OS patch on all the machines support HealthCare.gov. Using a rolling restart procedure, every machine was successfully updated without any disruption to the site.
Fast Restart, more reliable
Although performance was the main reason HealthCare.gov needed Terracotta, a feature in the software called BigMemory Fast Restart also makes HealthCare.gov more reliable. Unlike a traditional in-memory platform where data stored in memory could be lost in the case of an outage, everything that is cached inside Terracotta’s clustered memory is continuously backed up to disk to create a restorable storage area.
If a Terracotta node needs to be shut down for any reason, or even if it’s accidentally shut down, everything can be loaded back up from the disk-based storage in a matter of seconds.
“It’s really quick when you have to load the cache back into memory from disk,” Sanglier said. “And it makes a huge difference when it comes to reliability. It makes the Terracotta cluster operate more like a NoSQL database but with increased in-memory speeds.”
From the citizen’s point of view, the most important thing is that HealthCare.gov simply works. Most have no idea that potentially hundreds of thousands of other people are also using the site at the same time, making requests and planning their own healthcare needs. To them, it seems like a five-lane highway with no other vehicles in sight.
But behind the scenes, many people and systems such as Terracotta are constantly kept busy keeping the roads clear and the traffic flowing.