A brief look at what happens when we offer Dumpster Dive boxes and why this slowed SparkFun.com to a crawl.
Don't forget – today is the last day of the New Year Clearance Sale! All discounts end at midnight tonight!
You, our customers, are fantastic. And of course, everyone loves a deal. When a passionate customer base encounters a deal, well, stuff breaks.
Let me introduce myself again, Double M, Director or Software Development and IT (SWIT internally). I’m responsible for ensuring our website stays up. When you, as a customer, can’t get to our website, I hear about it (normally from a few different places). It can feel like I’m in Office Space and I filled out the wrong TPS report.
Everyone was very kind, but I was informed that our website did so many concurrent orders that a bunch of our customers were hit in backorder or were given an ‘out of stock’ message as they tried to check out.
Our actual stock sold out in two minutes. That’s you guys. Fantastic. We sold product for two more minutes. That’s on us. We sold more in the second two minutes than in the first two minutes. Again, passionate people are fantastic. Everyone with a backorder will get their stuff, it’s just going to take us a bit of time.
For everyone who didn’t get an order, who got bounced at checkout or the page loaded too slow, our apologies, but we sold a lot more than we had to begin with. People were always going to get left out of this sale, there is no way to get these in the hands of everyone who wants them.
While we are an ecommerce website, most of our traffic is predictable. We rarely have to throttle; we rarely need to add resources into our production stack; we almost never block IPs or sets of IPs. In terms of raw sales volume and customer orders, Cyber Monday is a big day for us – we have a good sale and we have a lot of traffic. That is the benchmark in my mind. 100% Cyber Monday uptimes. We achieved that (not really, but we hit over three 9’s of uptime on that day (99.9% - just a few seconds of hiccup/sluggishness, not even downtime)), and had very few, if any, connectivity issues across our customers.
The back-of-the-envelope traffic graph shows that we had a mild spike over an hour, but most of that hourly traffic spike actually happen in a few-minute span.
Now my benchmark has to be what occurred on Dumpster Dive day. I don’t want to take the blame for all of the latency, but some of it is ours. We saw an orders of magnitude increase in traffic as noon local came about at the start of the sale. This stressed the systems but they responded very quickly – just not quickly enough. As we pointed out, the whole event was over in minutes.
We had a lot of processes and steps in place that we learned from last year, but the stress on the system far outpaced what we expected using last year as a benchmark. So even though we were ready to handle some increase over last year’s load, we had different problems all together.
Now, we have tools in the vaporware stage of development that may help us the next time we do this. Of course, now I probably need to expect another uptick in traffic again as one of our test cases. It will include tools we’ll talk about before we launch, and checks we won’t talk about, which should help level the playing field (if I told you, you’d know how to work around it).
I’m glad so many of you love the Dumpster Dive. We’ll try and make it run smoothly, but at its very core it is still a limited release flash sale. We’ll try our best to make it fair, but it won’t be a situation where everyone gets one. It can never be that.