Skills, stories, and software every dev should know - 123dev #44

Posted on November 2, 2021 • 5 minutes • 878 words

Multiple pumpkins are carved and animated using stop motion

Comments

Hard work, unnoticed

We’ve all worked hard on something only to have it go completely unnoticed or the work completely cancelled. It really sucks. It’s out of your control, but that still doesn’t make you feel better.

When this happens to me—it has happened a few times in the past two months—I focus on what I learned doing the work. Was there a hard problem I solved? Did I work too hard and should I put better boundaries in the future? Knowing what I know now, would have have solved the problem differently if asked to do it again?

Just because someone else didn’t notice doesn’t mean the time was wasted. It’s a good opportunity to watch out for people doing good work and make sure you let them know you noticed and appreciate it.

Onboarding

I’m helping onboard someone at work and it reminds me of what it was like for me at Amazon. Sometimes, I think everything should be automated so we don’t have to go through this process each time, but then I remember that companies and processes change all the time. Automation is a lot more work to maintain than documentation.

I also remember that part of onboarding is building trust. The person has already been hired which shows everyone trusts they can do the work. But as humans we need to build trust over time and onboarding check-ins are a great way to start building it.

Being new you also need to learn what’s normal and acceptable. The best way to do that is by talking to people. You’re constantly watching for how and when people communicate. If everything was automated you would have every account and application you needed and be given a task list to start working. You would miss out on building trust, watching how, when, and where people ask questions, and who they go to for answers.

Would it be more efficient? Yes. Would it be better? I doubt it. It would be impersonal and could degrade team and company culture into minimal, transactional communication for the sole purpose of business delivery optimization. That sounds like a terrible place for people to work.

Links

You can’t find one thing that made you fail any more than you can find a single thing that made you successful. I love how simple that idea is and how clearly it stops people from looking for root causes.

Root cause of failure, root cause of success – Surfing Complexity — surfingcomplexity.blog Here are a couple of tweets from John Allspaw. https://twitter.com/allspaw/status/1027933534655270912 https://twitter.com/allspaw/status/1036942676376150016 Succeeding at a project in an organization is like pushing a boulder up a hill that is too heavy for any single person to lift. A team working together to successfully move a boulder to the top of the hill It doesn’t make sense…

Roblox had a 3 day outage and I feel really sorry for the team responsible for fixing it. I don’t know any details, but from a case study I read they use Hashicorp tools (consul, nomad, vault, terraform) with a 4 person SRE team serving 100,000,000 monthly players—might be more now.

One thing I found interesting was people online seemed surprised at the small team size relative to the amount of players. It’s not the scale that is a problem, but the scope of work and the company’s dependence on reliability that causes more burnout.

I was on a team of 4 in Disney Animation with a smaller scale and a larger scope and it was hard to manage all our responsibilities. However, with Disney Animation we had more freedom with reliability because we had no public facing services that would cause direct revenue loss.

My time as SRE at Disney+ was on a team of 4 with direct customer impact if we had an outage and similar scale and scope to Roblex. It was very stressful, but not entirely uncommon from what I’ve seen in the industry.

An Update on Our Outage - Roblox Blog — blog.roblox.com We recently experienced an extended outage across our platform. Learn more about the outage and steps taken to address it.

I started a casual club on Twitter for people who like to read and discuss white papers I call #PaperClub. I’ve learned a lot from reading papers and would love to share what I’m learning and learn from others too. We’ll be reading the papers and then discussing them in Twitter spaces once every few weeks. The first paper we’re going to read is the Google Borg paper which is a great read for distributed systems design.

I’m semi-cheating with this newsletter because I’m sharing my tweet which has two links. One to the Borg paper and one to the Twitter Space. If you want to add a calendar reminder for the space open the link from a mobile client and click “add to calendar”.

Justin Garrison on Twitter: “We will be discussing the original Google Borg white paper on our first ever #PaperClub You can read it here https://t.co/sc1OzpQprD We’ll discuss it on Nov 12 here https://t.co/ZOs1WiLAUB” “We will be discussing the original Google Borg white paper on our first ever #PaperClub You can read it here https://t.co/sc1OzpQprD We’ll discuss it on Nov 12 here https://t.co/ZOs1WiLAUB ”