Justin Garrison
October 17, 2021

Skills, stories, and software every dev should know - 123dev #41

Posted on October 17, 2021  •  3 minutes  • 476 words
Multiple abandoned buildings are destroyed

Comments

BGP

I heard the GIF came from what happened when Facebook ran the following command.

ip link set fb0 down

Thankfully they were able to fix it with

sudo reboot

Thinking

If you’re overthinking you should write. If you’re underthinking you should read.

Written word is much slower than spoken word. It helps to slow down and be deliberate when you have lots of ideas in your head.

When I’m “underthinking” I like to ask people questions, lots of questions. Reading lets me dwell on words longer. I can really slow down and take my time to have more thoughts about fewer words which often will cause me to overthink.

I try not to make this newsletter about current events, but it’s hard to avoid some of the lessons learned from this week. The first sentence in the last paragraph says so much (emphasis mine)

We’ve done extensive work hardening our systems to prevent unauthorized access, and it was **interesting** to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making.

I think this may be a lesson in the fact that once your application and organization gets to a sufficient size the only way to properly test failures is through practices like chaos engineering. Unknown unknowns are the things you can’t plan for.

More details about the October 4 outage - Facebook Engineeringengineering.fb.com Now that our platforms are up and running after yesterday’s outage, we are sharing more detail on what happened and what we’ve learned.

If you were one of the many people who learned about BGP this week I liked Julia’s practical list of tools and links anyone can use to explore BGP. It’s a good set of commands to know because there tends to be a large BGP inspired outage once every 5 years or so.

Tools to explore BGP Tools to explore BGP

One of the fallacies of distributed systems is that the network is reliable. This article has lots of examples of networks being unreliable even without human intervention.

The Network is Reliable - ACM Queue “The network is reliable” tops Peter Deutsch’s classic list, “Eight fallacies of distributed computing” (https://blogs.oracle.com/jag/resource/Fallacies.html) , “all [of which] prove to be false in the long run and all [of which] cause big trouble and painful learning experiences.” Accounting for and understanding the implications of network behavior is key to designing robust distributed programs—in fact, six of Deutsch’s “fallacies” directly pertain to limitations on networked communications. This should be unsurprising: the ability (and often requirement) to communicate over a shared channel is a defining characteristic of distributed programs, and many of the key results in the field pertain to the possibility and impossibility of performing distributed computations under particular sets of network conditions.

Follow me

Here's where I hang out in social media