Skip to main content

Oh noes! has be a disaster! Plagued by cronyism, bad management and too many cooks!

My background is IT, working with ERP systems. Based upon my experience, the implementation of has been a disaster, as long as you ignore all of the successes. But if you look at the successes and put them in perspective, then the implementation looks like a success. I discuss their successes on the other side.

Let's start with the basics:
* is a huge, huge project involving disparate systems from the federal government and every single state to talk to each other in real time
* There was no possibility for a gradual roll out that most projects this big would have. Nor was any slip in schedule possible. All fifty states had to go live on the same day
* There was a massive but understandable underestimation of the number of users when the site went live. I will discuss this further later

+ + + +
What are the most important things for a system like this:
* The data in the system not get corrupt
Some examples of corrupt data: (1) if multiple insurance policies were assigned to a single person (2) if a person bought family coverage but not all family members were covered, (3) insurance sales were recorded to non-existent customers, (4) if there were multiple records for the same person.

I haven't heard of any example of corrupt data. There could be some and it just hasn't been noticed. However, it usually pretty obvious when you are working with data that has been corrupted.

* The information presented to customers is correct
The number #1 problem with in its last month of development was that the it was getting the subsidized insurance rates wrong. If they hadn't fixed that problem, they wouldn't have been able to sell insurance through From the linked article, "Still, the long-term consequences of any malfunctions in registering and pricing may be limited. People may still be able to sign up offline, even if the online exchanges aren't fully functional at first, several insurers said."

* Security
Wouldn't it be fun to hack into your neighbor/co-worker/brother/sister's account and see what their income is? Or sign them up for an insurance policy even though they get insurance through their work? It is still early on this front, but I haven't seen any reports of anyone easily hacking into someone else's account.

* Performance
I am going to discuss in the next section.

+ + + +
Why did they go live when they had such performance issues?
Whenever you are talking about a system this complex, you can never solve every single problem. For one thing, you can never make a system foolproof because fools are so darn clever - they will do things that you never, ever would have thought. So there is a point where further testing doesn't provide much bang for the buck because all the most important known problems are solved and you know that the unknown problems are going to be worse than the remaining known problems.

For performance, they did some performance testing. The government expected its healthcare reform's website to draw 50,000-60,000 users at once based upon on the all-time high of 30,000 simultaneous users for Also, they expected the volume to be low initially. I find that expectation reasonable because (1) the website went live three months before the insurance you bought from it could take effect and (2) you have to pay a month's worth of insurance when you sign up, so it is foolish to pay for insurance in October.

My guess is that they were expecting to have a 30-45 days to work out the performance problems before they started getting high volumes of users. My guess would be that with a system this complex, there really isn't way to know for sure what the performance bottlenecks are until you go live. You can do load testing, but it based upon lots and lots of assumptions that won't be true.

You know what happened - the initial volume was far, far more than what they were expecting. Traffic hit over 250,000 users. I saw an estimate of 10 million users visited the website on the first day. The volume crushed the web site. From what I have read, no insurance was sold on the first two days. However, by Saturday I was able to set up an account and get insurance quotes.

+ + + +
How do problems with systems like this get fixed?
What happens is the managers decide what the top problem is and they throw all of their resources at the problem. When it gets fixed, they throw all their resources at the new top problem. Repeat lots of times until you have a bunch of small problems that are insignificant enough that you can work on them in parallel.

As I said, the top problem before going live was inaccurate price quotes. The top problem when they went live was site performance. Both of those problems appear to be dead and the support team has moved on to other problems

+ + + +
But Ezra Klein said it was a disaster!
Ezra Klein is a really smart dude, but he has never worked in IT. He ignores the problems that have been fixed and mentions one (count 'em - one) problem:

In the weeks leading up to the launch I heard some very ugly things about how the system was performing when transferring data to insurers -- a necessary step if people are actually going to get insurance...Here is one example from a carrier–and I have received numerous reports from many other carriers with exactly the same problem. One carrier exec told me that yesterday they got 7 transactions for 1 person – 4 enrollments and 3 cancellations.
First off - as long as there isn't data corruption problems with the database, the transmission of data to insurance companies isn't a huge problem because they have months to get it right.

Most important, the example he gives shows a lack of IT experience. It is really hard to figure out incremental changes to a record, so whenever a change happens, the easiest thing is to re-send all the information. Otherwise, the system has to keep track of what the prior record value, what exactly changed and how to send the change information. That is far more error-prone that just re-sending the current account information. So if I sign up for insurance, then change my mailing address, the system probably sends a cancellation of my prior policy and then re-sends all the information for my account. The insurance company receiving the information should have ETL (extract, transformation and load) code that compares each new record for an account to determine what has to be changed. So Ezra is getting upset about something that shouldn't be a problem.

+ + + +
What about cronyism, bad management and too many cooks
David Auberach does yeoman work digging into what contracts were let for the development for the website. However, he seems to have strong opinions about how things should have been done that color his judgement about the success of the project.

For example, he appears to hate how the government picks vendors for IT projects. Now, I am sure there are better ways of picking vendors for IT projects, but to make sweeping judgements about the project because the government picked vendors like it always has is stupid. Where does the "cronyism" from the title come from? As far as I can tell, it comes from the fact that Booz-Allen got a $6 million contract for the project. I too hate high-level consulting companies and think they charge ridiculous amounts for so-so advice. And there were probably lots of companies that would have given better advice for less money. We are talking about a project that was way, way beyond the experience of Health and Human Services. So it was really impossible for them to pick which companies really knew their stuff on this issue and which didn't. So Booz-Allen was not the best pick, but probably a safe pick.

Where does the "bad management" in the David Auberch's title come from? My impression is that he expected the development process to be done a certain way and the government didn't do it that way. Now, he could argue that was a poor decision, but he should make that argument. But to declare the project was badly managed just because they didn't use his desired development process is stupid.

Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags


More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  I enrolled sucessfully (8+ / 0-)

    yea, the website has glitches, but with a little patience I was able to complete my enrollment and choose a plan.
    Just one of the many thousands of people that will now have access to healthcare at an affordable price.

    I hope they will have added capacity and debugged before too long.
    it is too important not to

  •  There Certainly Was the Option of Gradual Rollout. (2+ / 0-)
    Recommended by:
    skrekk, Habitat Vic

    Not gradual portions of the system of course.

    But 80% of the traffic level could've been kept out by limiting access by birthday or alphabet. Drivers' license bureaus and preschools have been doing this forever.

    The first week's traffic for the known period of worst glitches could've been as small as 10% of the alphabet. Then another 10% the next week, and additional 20%'s added on successive weeks as the debugging proceeded.

    By mid november they'd have been through the alphabet or the birth month calendar, they could leave it open now to the whole population and would never in the future expect to face the load they did by going live with the entire nation on day 1.

    And it'd still be 6 weeks before anybody could make a purchase.

    We are called to speak for the weak, for the voiceless, for victims of our nation and for those it calls enemy.... --ML King "Beyond Vietnam"

    by Gooserock on Mon Oct 14, 2013 at 04:50:08 PM PDT

  •  As an IT professional for 20+ years (7+ / 0-)

    with similar background working on large scale complex systems with a variety of technologies....

    I agree with this article for the most part.  Arbitrary deadline and a baseline assumptions that seemed reasonable but turned out to be completely wrong explain most of what we're seeing.

    I agree that data corruption is not being seen, which is good.  Getting THAT wrong would be a disaster that isn't a "glitch", it could literally torpedo the entire effort.

    If I was to pick one point to hammer the software vendor on, it would be the piss-poor error handling on the portal itself.

    Users got a wide variety of useless failure messages that left them in doubt as to the state of their application.  A simple universal trap on anything that timed out saying " is experiencing unexpected load.  Please try again later, or try our call center at to submit your application by telephone." really isn't that hard if you bake it into the design at the beginning.

    A bit more sophistication would have saved the state of their session at various points, reducing the rework and eliminating issues where some users seem to have applications that can't be completed without using a new email address.  Also if you have that you can tune your failure message to say things like "your account confirmation email is on  your way" or "You will need to start the process from the beginning" to give the user a sense that the app isn't just blowing up, but is just having a temporary issue but the work you did isn't wasted and you know what it is you have to do next.

    Unfortunately, due to the prioritization process mentioned in the article, the error handling won't get fixed for a long time if ever, because once they get a handle on the load, the failure that is causing the current spate of errors will cease and the demand to fix this aspect of the site will be much reduced.

    My take's a success if they get the site reliable for most users about when the shutdown and debt ceiling stuff exits the news cycle.  If it is still sucking after our government is back to business as usual, it'll quickly become an actual political problem.   If it's still having problems by December, you could damage adoption quite a bit, because people are unlikely to panic until the calendar starts to approach Jan 1.

    •  Phase II man... (1+ / 0-)
      Recommended by:

      after all the glitches are handled...better error-handling and finessing the system to deliver more sophisticated messaging....they probably WANTED to see the error messaging as it give clues to what was going on from the client standpoint. To match up with what they were experiencing on the back-end.

      •  Something we did on that front in my project (0+ / 0-)

        We presented an error page that had the user-friendly generic message.

        Then put the core-dump in the same response web page, with raw error output.  In white text on white background.

        It didn't confuse the user, but if the user called a support person (or IT simulated it), you had the full original error right there just by scrolling your mouse over the white page under the user-friendly error message.

        If is presenting the raw error for a good reason, this is how you get to have your cake and eat it too.  It's also really easy to code.  You just change the color of your text to match your background on your existing error output and insert boilerplate text at the top, which can be the same for all outside error traps, or at least large groups of related traps.

  •  Thanks for this diary. n/t (3+ / 0-)

    "Southern nights have you ever felt a southern night?" Allen Toussaint ~~Remember the Gulf of Mexico~~

    by rubyr on Mon Oct 14, 2013 at 06:21:42 PM PDT

  •  7 transactions for one person... (2+ / 0-)
    Recommended by:
    madronagal, JamieG from Md

    is a good thing. This means that the transactions got through. The insurers have something to work with.

    However, there are still people who can't get into the site. This has to be fixed soon. I am hopeful it will be.

  •  Thanks for putting it in perspective. /nt (1+ / 0-)
    Recommended by:
    JamieG from Md

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site