Wednesday 4 March 2020

Post Mortem: Unavailability of Good Work-From-Home Coffee

Unavailability of Good Work-From-Home Coffee


Date: 2020-03-04


Authors: @simonlbn


Status: In progress, but published as draft; IMPACT ONGOING.


Summary: During mandatory Work From Home for User from Internet Conglomerate, redundancy of Coffee Makers went from N to N-1, where N was 1. This lack of redundancy had not been considered in advance.


Background: User, who resides in Ireland, grew up in Denmark and as such strongly prefers filter coffee over espresso, instant coffee and tea for caffeine delivery mechanism / hot beverage of choice. While instant coffee and tea can be used for emergency backup, they cannot be considered full mitigation. While switching and diversifying the delivery mechanism / hot beverage of choice to include more tea is in progress, the rollout is ongoing and will not be complete for the foreseeable future.

User had just returned home to Ireland from the United States (trip included both west and east coast), and was at the time of the incident suffering from severe jet lag.


Impact: Reduced productivity for User on Work from Home day and reduced availability while external temporary mitigation was applied. New machine has to be acquired and delivered extending the downtime. May not have good coffee for the weekend.


Root Causes: During resource provisioning, only a single coffee maker had been acquired. Glass is a bad material for a coffee pot for people suffering from jet lag.


Trigger: Severe jet lag meant extra poor hand eye coordination while returning glass pot to coffee maker, causing impact between glass pot and metal kitchen roll stand, with immediate failure of glass pot as a result.


Resolution: NOTE IMPACT ONGOING - Acquired new primary coffee maker and backup drip coffee maker bringing User to N+1 for good Coffee production. To reduce the risk of recurrence, a new primary coffee maker will not have glass pot.


Detection: User immediately noticed an unfortunate shattering sound while operating coffee makers glass pot. No monitoring in place to detect failure mode.


Lessons Learned

What went well

  • Shattered glass was immediately removed preventing further and worse incidents.
  • Instant coffee and tea were both available as emergency backup if the issue escalated.

What went wrong

  • In the years preceding the incident User had not properly considered redundancy of at-home coffee making equipment.
  • Coffee pots should not be made of glass due to the risk of catastrophic failure.
  • [Ongoing] Backup coffee maker was possibly ordered too late to be delivered on Friday, extending the outage over the weekend.

Where we got lucky

  • First coffee pot of the day had been completed and had just been filled into a thermal (metal) coffee pot providing coffee for the before noon period.
  • Ongoing jetlag meant expected productivity was already very low, so effective productively loss probably smaller than at other times.
  • Mandatory work from home got lifted for the day after the incident (2020-03-05), so at-work coffee could be acquired for primary consumption.
  • The coffee maker was reasonably old so it can be replaced with reasonably good conscience, especially since the User had considered a new coffee maker with a thermal coffee pot regardless of incident.
  • Only the User is an active user of the Coffee Making facilities at the location, so the outage only impacts primary User.

Action Items

Action Item

Type

Owner

Status

Remove shattered glass.

prevent

@simonlbn

Done.

Acquire temporary external pre-produced coffee.

mitigate

@simonlbn

Done.

Acquire new primary coffee maker with non-glass pot.

prevent

@simonlbn

In progress.

Acquire backup coffee maker.

prevent

@simonlbn

In progress.

Recycle / dispose of the old old coffee maker.

cleanup.

@simonlbn

Not started.



Timeline (all times UTC)

2020-02-28

20:00 User (@simonlbn) leaves San Francisco International Airport (SFO/KSFO) for New York John F Kennedy Airport (JFK/KJFK).

2020-03-02

23:50 User leaves New York John F Kennedy Airport (JFK/KJFK) for London Heathrow Airport (LHR/EGLL) on British Airways flight 174 on a Boeing 777-200ER (registration G-RAES).

2020-03-03

06:46 User arrives at London Heathrow Airport.

08:20 User leaves London Heathrow Airport for Dublin Airport (DUB/EIDW) on British Airways flight 830 on an Airbus A319 (registration G-EUPW).

09:45 User arrives at Dublin Airport.

2020-03-04

06:55 User woken up by biological alarm clocks (also known as children) after 4 + 2 hours of sleep.

08:20 First coffee of the day started.

08:30 Coffee is poured from glass pot to metal thermal coffee pot.

08:31 INCIDENT BEGINS Glass pot is attempted returned to the coffee maker and hits the metal kitchen roll holder in the process partly shattering the pot.

11:30 No more decent in-house coffee is available.

12:00 External coffee acquired causing unavailability of User while acquisition occurs.

21:02 New backup coffee maker ordered (pour over set which does not take up too much space when not in use).

2020-03-06 (expected)

??:?? INCIDENT MITIGATED New backup coffee maker delivered.

2020-03-09 (expected)

??:?? INCIDENT ENDS New primary coffee maker delivered.

??:?? New primary coffee maker installed and smoke tested.


Author note

This was created and published while Author still was suffering from jet lag without independent review, so typos and other errors are likely even though this has been grammatically checked by the Google Docs AI.


Author was unable to find a nice document template to use, so was inspired by the structure template from https://landing.google.com/sre/sre-book/chapters/postmortem/ (part of the Google SRE book).


This document was created without using Internet Conglomerate resources, and as such is © 2020 Simon L. B. Nielsen. It also should not be considered endorsed by any Internet Conglomerates real of fictious.