PyCon Canada 2018

I've very happy to have had the opportunity to attend and speak at PyCon Canada here in Toronto last week.

PyCon has always been a very well organized conference. There are a wide range of talks available, even on topics not directly related to Python. I've attended previous PyCon events in the past, but never the Canadian one!

My talk was titled How Mozilla uses Python to Build and Ship Firefox. The slides are available here if you're interested. I believe the sessions were recorded, but they're not yet available online. I was happy with the attendance at the session, and the questions during and after the talk.

As part of the talk, I mentioned how Release Engineering is a very distributed team. Afterwards, many people had followup questions about how to work effectively with remote teams, which gave me a great opportunity to recommend John O'Duinn's new book, Distributed Teams.

Some other highlights from the conference:

  • CircuitPython: Python on hardware I really enjoyed learning about CircuitPython, and the work that Adafruit is doing to make programming and electronics more accessible.

  • Using Python to find Russian Twitter troll tweets aimed at Canada A really interesting dive into 3 million tweets that FiveThirtyEight made available for analysis.

  • PEP 572: The Walrus Operator My favourite quote from the talk: "Dictators are people too!" If you haven't followed Python governance, Guido stepped down as BDFL (Benevolent Dictator for Life) after the PEP was resolved. Dustin focused much of his talk about how we in the Python community, and more generally in tech, need to treat each other better.

  • Who's There? Building a home security system with Pi & Slack A great example of how you can get started hacking on home automation with really simple tools.

  • Froilán Irzarry's Keynote talk on the second day was really impressive.

  • You Don't Need That! Design patterns in Python My main takeaway from this was that you shouldn't try and write Python code as if it were Java or C++ :) Python has plenty of language features built-in that make many classic design patterns unnecessary or trivial to implement.

  • Numpy to PyTorch Really neat to learn about PyTorch, and leveraging the GPU to accelerate computation.

  • Flying Python - A reverse engineering dive into Python performance Made me want to investigate Balrog performance, and also look at ways we can improve Python startup time. Some neat tips about examining disassembled Python bytecode.

  • Working with Useless Machines Hilarious talk about (ab)using IoT devices.

  • Gathering Related Functionality: Patterns for Clean API Design I really liked his approach for creating clean APIs for things like class constructors. He introduced a module called variants which lets you write variants of a function / class initializer to support varying types of parameters. For example, a common pattern is to have a function that takes either a string path to a file, or a file object. Instead of having one function that supports both types of arguments, variants allows you to make distinct functions for each type, but in a way that makes it easy to share underlying functionality and also not clutter your namespace.

So long Buildbot, and thanks for all the fish

Last week, without a lot of fanfare, we shut off the last of the Buildbot infrastructure here at Mozilla.

Our primary release branches have been switched over to taskcluster for some time now. We needed to keep buildbot running to support the old ESR52 branch. With the release of Firefox 60.2.0esr earlier this month, ESR52 is now officially end-of-life, and therefore so is buildbot here at Mozilla.

Looking back in time, the first commits to our buildbot-configs repository was over 10 years ago on April 27, 2008 by Ben Hearsum: "Basic Mozilla2 configs". Buildbot usage at Mozilla actually predates that by at least two years, Ben was working on some patches in 2006.

Earlier in my career here at Mozilla, I was doing a lot of work with Buildbot, and blogged quite a bit about our experiences with it.

Buildbot served us well, especially in the early days. There really were no other CI systems at the time that could operate at Mozilla's scale.

Unfortunately, as we kept increasing the scale of our CI and release infrastructure, even buildbot started showing some problems. The main architectural limitations of buildbot we encountered were:

  1. Long lived TCP sessions had to stay connected to specific server processes. If the network blipped, or you needed to restart a server, then any jobs running on workers were interrupted.

  2. Its monolithic design meant that small components of the project were hard to develop independently from each other.

  3. The database schema used to implement the job queue became a bottleneck once we started doing hundreds of thousands of jobs a day.

On top of that, our configuration for all the various branches and platforms had grown over the years to a complex set of inheritance rules, defaults, and overrides. Only a few brave souls outside of RelEng managed to effectively make changes to these configs.

Today, much much more of the CI and release configuration lives in tree. This has many benefits including:

  1. Changes are local to the branches they land on. They ride the trains naturally. No need for ugly looooooops.

  2. Developers can self-service most of their own requests. Adding new types of tests, or even changing the compiler are possible without any involvement from RelEng!

Buildbot is dead! Long live taskcluster!

Firefox release speed wins

Sylvestre wrote about how we were able to ship new releases for Nightly, Beta, Release and ESR versions of Firefox for Desktop and Android in less than a day in response to the pwn2own contest.

People commented on how much faster the Beta and Release releases were compared to the ESR release, so I wanted to dive into the releases on the different branches to understand if this really was the case, and if so, why?

Chemspill timings

                    | Firefox ESR 52.7.2 | Firefox 59.0.1  | Firefox 60.0b4
 ------------------ | ------------------ | --------------- | --------------
 Fix landed in HG   | 23:33:06           | 23:31:28        | 23:29:54
 en-US builds ready | 03:19:03 +3h45m    | 01:16:41 +1h45m | 01:16:47 +1h46m
 Updates ready      | 08:43:03 +5h42m    | 04:21:17 +3h04m | 04:41:02 +3h25m
 Total              | 9h09m              | 4h49m           | 5h11m

(All times UTC from 2018-03-15 -> 2018-03-16)

Summary

via GIPHY

We can see that Firefox 59 and 60.0b4 were significantly faster to run than ESR 52 was! What's behind this speedup?

Release Engineering have been busy migrating release automation from buildbot to taskcluster . Much of ESR52 still runs on buildbot, while Firefox 59 is mostly done in Taskcluster, and Firefox 60 is entirely done in Taskcluster.

In ESR52 the initial builds are still done in buildbot, which has been missing out on many performance gains from the build system and AWS side. Update testing is done via buildbot on slower mac minis or windows hardware.

The Firefox 59 release had much faster builds, and update verification is done in Taskcluster on fast linux machines instead of the old mac minis or windows hardware.

The Firefox 60.0b4 release also had much faster builds, and ended up running in about the same time as Firefox 59. It turns out that we hit several intermittent infrastructure failures in 60.0b4 that caused this release to be slower than it could have been. Also, because we had multiple releases running simultaneously, we did see some resource contention for tasks like signing.

For comparison, here's what 60.0b11 looks like:

                    | Firefox 60.0b11
 ------------------ | --------------- 
 Fix landed in HG   | 18:45:45
 en-US builds ready | 20:41:53 +1h56m
 Updates ready      | 22:19:30 +1h37m
 Total              | 3h33m

Wow, down to 3.5 hours!

In addition to the faster builds and faster update tests, we're seeing a lot of wins from increased parallelization that we can do now using taskcluster's much more flexible scheduling engine. There's still more we can do to speed up certain types of tasks, fix up intermittent failures, and increase parallelization. I'm curious just how fast this pipeline can be :)

Taskcluster migration update: we're finished!

We're done!

Over the past few weeks we've hit a few major milestones in our project to migrate all of Firefox's CI and release automation to taskcluster.

Firefox 60 and higher are now 100% on taskcluster!

Tests

At the end of March, our Release Operations and Project Integrity teams finished migrating Windows tests onto new hardware machines, all running taskcluster. That work was later uplifted to beta so that CI automation on beta would also be completely done using taskcluster.

This marked the last usage of buildbot for Firefox CI.

Periodic updates of blocklist and pinning data

Last week we switched off the buildbot versions of the periodic update jobs. These jobs keep the in-tree versions of blocklist, HSTS and HPKP lists up to date.

These were the last buildbot jobs running on trunk branches.

Partner repacks

And to wrap things up, yesterday the final patches landed to migrate partner repacks to taskcluster. Firefox 60.0b14 was built yesterday and shipped today 100% using taskcluster.

A massive amount of work went into migrating partner repacks from buildbot to taskcluster, and I'm really proud of the whole team for pulling this off.

So, starting today, Firefox 60 and higher will be completely off taskcluster and not rely on buildbot.

It feels really good to write that :)

We've been working on migrating Firefox to taskcluster for over three years! Code archaeology is hard, but I think the first Firefox jobs to start running in Taskcluster were the Linux64 builds, done by Morgan in bug 1155749.

Into the glorious future

It's great to have migrated everything off of buildbot and onto taskcluster, and we have endless ideas for how to improve things now that we're there. First we need to spend some time cleaning up after ourselves and paying down some technical debt we've accumulated. It's a good time to start ripping out buildbot code from the tree as well.

We've got other plans to make release automation easier for other people to work with, including doing staging releases on try(!!), making the nightly release process more similar to the beta/release process, and for exposing different parts of the release process to release management so that releng doesn't have to be directly involved with the day-to-day release mechanics.

Taskcluster migration update, the sequel

Firefox, now 100% buildbot-free!

First, the good news - Developer Edition 60.0b1 will be the first release in nearly 10 years done without using buildbot. This is an amazing milestone, and I'm incredibly proud of everybody who has contributed to make this possible!

Long time, no update

How did we get here? It's been, uh, almost 6 months since I last posted an update about our migration to Taskcluster.

In my last update, I described our plans for the end of 2017...

We're on track to ship builds produced in Taskcluster as part of the
56.0 release scheduled for late September. After that the only Firefox
builds being produced by buildbot will be for ESR52.

Meanwhile, we've started tackling the remaining parts of release
automation. We prioritized getting nightly and CI builds migrated to
Taskcluster, however, there are still parts of the release process
still implemented in Buildbot.

We're aiming to have release automation completely migrated
off of buildbot by the end of the year. We've already seen many
benefits from migrating CI to Taskcluster, and migrating the release
process will realize many of those same benefits.

How'd we do?

We're past the end of 2017, so how are we doing?

Well, we successfully shipped 56.0 with builds produced in Taskcluster. Our big Firefox Quantum release (57.0), was also shipped with builds produced by Taskcluster.

(side note: 57 had the most complex update scenarios we've ever had to support for Firefox...a subject for another post!)

Release scheduling

Post-56.0, our release process was using Taskcluster exclusively for producing the initial builds, and all the release process scheduling. We were still using Buildbot for many of the post-build tasks, like l10n repacks, publishing updates, pushing files to S3, etc. Once again we relied on the buildbot bridge to allow us to integrate existing buildbot components with the newer taskcluster pipeline. I learned from Kim Moir that this is a great example of the strangler pattern.

In the fall of 2017, we decided to begin migrating all of the scheduling logic for release automation into taskcluster using the in-tree taskgraph scheduling system. We did this for a few reasons...

  1. Having the release scheduling logic ride the trains is much more maintainable. Previous to this we had an externally defined release pipeline in our releasetasks repo. It was hard to keep this repository in sync with changes required for beta/release and ESR branches.

  2. More importantly, having the release scheduling logic in-tree meant that we could then rely on chain-of-trust to verify artifacts produced by the release pipeline.

  3. We felt that having the complete release pipeline defined in taskcluster would make it easier for us to tackle the remaining buildbot bridge tasks in parallel.

We hit this milestone in the 58 cycle. Starting with 58.0b3, Firefox and Fennec releases were completely scheduled using the in-tree taskgraph generation. We also migrated over the l10n repacks at the same time, removing a longstanding source of problems where repacks would fail when we first got to beta due to environmental differences between taskcluster and buildbot.

No-BBB Releases

Still, as of 58, much of release automation still ran on buildbot, even if Taskcluster was doing all the scheduling.

Since December, we've been working on removing these last few pieces of buildbot from the release process. Progress was initially a bit slow, given Austin and Christmas, but we've been hard at work in the new year.

That brings us to today.

We've moved uptake monitoring, update verify (and made it 2x faster too!), update submission, final verify, bouncer submission, version bumping and tagging, balrog submission all to run in Taskcluster via various kinds of scriptworkers.

As I mentioned above, DevEdition 60.0b1 will be the first release in nearly 10 years done without using buildbot. The rest of the 60 release cycle will follow suit, and once 60 hits the release channel, only ESR52 will remain on buildbot!

Taskcluster migration update

All your nightlies are belong to Taskcluster

In January I announced that we had just migrated Linux nightly builds to Taskcluster.

We completed a huge milestone in July: starting in Firefox 56, we've been doing all our nightly Firefox builds in Taskcluster.

https://media.giphy.com/media/MOWPkhRAUbR7i/giphy.gif

This includes all Windows, macOS, Linux, and Android builds. You can see all the builds and repacks on Treeherder.

In August, after 56 merged to Beta, we've also been doing our Firefox Beta builds using Taskcluster. We're on track to be shipping Firefox 56, built from Taskcluster to release users at the end of September.

Windows and macOS each had their own challenges to get them ready to build and ship to our nightly users.

Windows signing

We've had Windows builds running in Taskcluster for quite a while now. The biggest missing piece stopping us from shipping these builds was signing. Windows builds end up being a bit complicated to sign.

First, each compiled .exe and .dll binary needs to be signed. Signing binaries in windows changes their contents, and so we need to regenerate some files that depend on the exact contents of binaries. Next, we need to create packages in various formats: a "setup.exe" for installing Firefox, and also MAR files for updates. Each of these package formats in turn need to be signed.

In buildbot, this process was monolithic. All of the binary generation and signing happened as part of the same build process. The same process would also publish symbols to the symbol server and publish updates to Balrog The downside of this monolithic process is that it adds additional dependencies to the build, which is already a really long process. If something goes wrong with signing, or publishing updates, you don't want to have to restart a 2 hour build!

As part of our migration to Taskcluster, we decided that builds should minimize their external dependencies. This means that the build task produces only unsigned binaries, and it is the responsibility of downstream tasks to sign them. We also wanted discrete tasks for symbol and update submission.

One wrinkle in this approach is that the logic that defines how to create a setup.exe package or a MAR file lives in tree. We didn't want to run that code in the same context as the code that generates signatures.

Our solution to this was to create a sequence of build -> signing -> repackage -> signing tasks. The signing tasks run in a restricted environment while the build and repackage tasks have access to the build system in order to produce the required artifacts. Using the chain of trust, we can demonstrate that the artifacts weren't tampered with between intermediate tasks.

Finally, we need to consider l10n repacks. We ship Firefox in over 90 locales. The repacking process downloads the en-US build and replaces the English strings with localized strings. Each of these repacks needs to be based on the signed en-US build. Each will also generate its own setup.exe and complete MAR for updates.

macOS performance (and why your build directory matters)

Like Windows, we've had macOS builds running on Taskcluster for a long time. Also like Windows, we had to solve signing for macOS.

However, the biggest blocker for the macOS build migration, was a performance bug. Builds produced on Taskcluster showed some serious performance regressions as compared to the builds produced on buildbot.

Many very smart people looked at this bug since it was first discovered in February. They compared library versions being used. They compared compiler versions and compiler flags. They even inspected the generated assembly code from both systems.

Mike Shal stumbled across the first clue to what was going on in June: if he stripped the Taskcluster binaries, then the performance problems disappeared! At this point we decided that we could go ahead and ship these builds to nightly users, knowing that the performance regression would disappear on beta and release.

Later on, Mike realized that it's not the presence or absence of symbols in the binary that cause the performance hit, it's what directory the builds are done in. On buildbot we build under /builds/..., and on Taskcluster we build under /home/...

https://media.giphy.com/media/zjQrmdlR9ZCM/giphy.gif

Read the bug for more gory details. This is definitely one of the strangest bugs I've seen.

Lessons learned

We learned quite a bit in the process of migrating Windows and macOS nightly builds to Taskcluster.

First, we gained a huge amount of experience with the in-tree scheduling system. There's a bit of a learning curve to climb, but it's an extremely powerful and flexible system. Many kudos to Dustin for his work creating the foundation of this system here. His blog post, "What's So Special About "In-Tree"?", is a great explanation of why having this code as part of Firefox's repository is so important.

One of the killer features of having all the scheduling logic live in-tree is that you can do quite a bit of work locally, without requiring any build infrastructure. This is extremely useful when working on the complex build / signing / repackage sequence of tasks described above. You can make your changes, generate a new task graph, and inspect the results.

Once you're happy with your local changes, you can push them to try to validate your local testing, get your patch reviewed, and then finally landed in gecko. Your scheduling changes will take effect as soon as they land into the repo. This made it possible for us to do a lot of testing on another project branch, and then merge the code to central once we were ready.

What's next?

We're on track to ship builds produced in Taskcluster as part of the 56.0 release scheduled for late September. After that the only Firefox builds being produced by buildbot will be for ESR52.

Meanwhile, we've started tackling the remaining parts of release automation. We prioritized getting nightly and CI builds migrated to Taskcluster, however, there are still parts of the release process still implemented in Buildbot.

We're aiming to have release automation completely migrated off of buildbot by the end of the year. We've already seen many benefits from migrating CI to Taskcluster, and migrating the release process will realize many of those same benefits.

Thanks!

Thank you for reading this far!

Members from the Release Engineering, Release Operations, Taskcluster, Build, and Product Integrity teams all were involved in finishing up this migration. Thanks to everyone involved (there are a lot of you!) to getting us across the finish line here.

In particular, if you come across one of these fine individuals at the office, or maybe on IRC, I'm sure they would appreciate a quick "thank you":

  • Aki Sasaki
  • Dustin Mitchell
  • Greg Arndt
  • Joel Maher
  • Johan Lorenzo
  • Justin Wood
  • Kim Moir
  • Mihai Tabara
  • Mike Shal
  • Nick Thomas
  • Rail Aliiev
  • Rob Thijssen
  • Simon Fraser
  • Wander Costa

Nightly builds from Taskcluster

Yesterday, for the very first time, we started shipping Linux Desktop and Android Firefox nightly builds from Taskcluster.

74851712.jpg

We now have a much more secure, resilient, and hackable nightly build and release process.

It's more secure, because we have developed a chain of trust that allows us to verify all generated artifacts back to the original decision task and docker image. Signing is no longer done as part of the build process, but is now split out into a discrete task after the build completes.

The new process is more resilient because we've split up the monolithic build process into smaller bits: build, signing, symbol upload, upload to CDN, and publishing updates are all done as separate tasks. If any one of these fail, they can be retried independently. We don't have to re-compile the entire build again just because an external service was temporarily unavailable.

Finally, it's more hackable - in a good way! All the configuration files for the nightly build and release process are contained in-tree. That means it's easier to inspect and change how nightly builds are done. Changes will automatically ride the trains to aurora, beta, etc.

Ideally you didn't even notice this change! We try and get these changes done quietly, smoothly, in the background.

This is a giant milestone for Mozilla's Release Engineering and Taskcluster teams, and is the result of many months of hard work, planning, coding, reviewing and debugging.

Big big thanks to jlund, Callek, mtabara, kmoir, aki, dustin, sfraser, jlorenzo, coop, jmaher, bstack, gbrown, and everybody else who made this possible!

2016 RelEng Retrospective

As 2016 winds down, I wanted to take some time to highlight all the work our Release Engineering team has done this year. Personally, I really enjoy writing these retrospective posts. I think it's good to spend some time remembering how far we've come in a year. It's really easy to forget what you did last month, and 6 months ago seems like ancient history!

People!

We added four people to our team this year!

Aki (:aki) re-joined us in January and has been working hard on developing a security model for Taskcluster for sensitive tasks like signing and publishing binaries.

Rok (:garbas) started in February and has been working on modernizing our web application framework development and deployment processes.

Johan (:jlorenzo) started in August and has been improving our release automation, Balrog, and automatically publishing Android builds to the Google Play Store.

Simon (:sfraser) started in October and has been improving monitoring of our production systems, as well as getting his feet wet with our partial update generation system.

Releases

This year we released 104 desktop versions of Firefox, and 58 android versions (including Beta, Release and ESR branches).

5 of those releases were just in the week prior to our all hands meeting in Hawaii!

Several other releases this year were special for particular reasons, and required special efforts on our part. We continued to provide SHA-1 signed installers for Windows XP users. We also produced a special 47.0.2 release in order to try and rescue users stuck on 47. We've never shipped a point release for a previous release branch before! We've also generated partial updates to try and help users on 43.0.1 and 47.0.2 get faster updates to the latest version of Firefox.

Release promotion

We couldn't have shipped so many releases so quickly last week if it weren't for release promotion. Previous to Firefox 46, our release process would generate completely new builds after CI was finished. This wasted a lot of time, and also meant we weren't shipping the exact binaries we had tested. Today, we ship the same builds that CI has generated and tested. This saves a ton of time (up to 8 hours!), and gives us a lot more confidence in the quality of the release.

This is one of those major kinds of changes that really transforms how we approach doing releases. I can't really remember what it was like doing releases prior to release promotion!

We also added support in Shipit to allow starting a release before all the en-US builds are done. This lets our Release Management team kick off a release early, assuming all the builds pass. It saves a person having to wait around watching Treeherder for the coveted green builds.

Windows in AWS

This year we completed our migration to AWS for Windows builds. 100% of our Windows builds are now done in AWS. This means that we now have a much faster and more scalable Windows build platform.

In addition, we also migrated most of the Windows 7 unittests to run in AWS. Previously these were running on dedicated hardware in our datacentre. By moving these tests to AWS, we again get a much more scalable test platform, but we also freed up hardware capacity for other test platforms (e.g. Windows XP).

Taskcluster

One of our major focus areas this year was migrating our infrastructure from Buildbot to Taskcluster. As of today, we have:

  • Fully migrated Linux64 and Android debug builds and tests
  • Builds for all other platforms operating as Tier2
  • Linux64 and Android nightly builds, l10n repacks and updates operating as Tier2
  • Tons of security design & implementation work

Balrog

Scheduled Changes in Balrog means that now we can have machines set background update rate to 0% 24 hours after release, instead of having a human do it.

Balrog itself was migrated from our datacentre in SCL3 into AWS. We now have a much more flexible deployment pipeline.

Balrog has also been one of our best projects for getting volunteer contributions! Many of the work done this year was done by contributors!

RIP

Being able to shut off old, crufty and deprecated stuff is an important part of staying agile. This year we were finally able to develop an end of life plan for Windows XP. In addition, we discontinued support for OSX 10.6-10.8, systems without SSE2, and 32-bit OSX systems. Not having to support these old platforms simplifies managing our infrastructure, and also makes product development easier.

We also shut down all the panda mobile testing infrastructure and legacy vcs-sync.

What's next?

2017 is looking like it's going to be another interesting (and busy!) year for RelEng.

Our top priority is to finish the migration to Taskcluster. Hopefully by the end of 2017, the only thing left on buildbot will be the ESR52 branch. This will require some big changes to our release automation, especially for Fennec.

We're also planning to provide some automated processes to assist with the rest of the release process. Releases still involve a lot of human to human handoffs, and places where humans are responsible for triggering automation. We'd like to provide a platform to be able to manage these handoffs more reliably, and allow different pieces of automation to coordinate more effectively.

PyCon 2016 report

I had the opportunity to spend last week in Portland for PyCon 2016. I'd like to share some of my thoughts and some pointers to good talks I was able to attend. The full schedule can be found here and all the videos are here.

Monday

Brandon Rhodes' Welcome to PyCon was one of the best introductions to a conference I've ever seen. Unfortunately I can't find a link to a recording... What I liked about it was that he made everyone feel very welcome to PyCon and to Portland. He explained some of the simple (but important!) practical details like where to find the conference rooms, how to take transit, etc. He noted that for the first time, they have live transcriptions of the talks being done and put up on screens beside the speaker slides for the hearing impaired.

He also emphasized the importance of keeping questions short during Q&A after the regular sessions. "Please form your question in the form of a question." I've been to way too many Q&A sessions where the person asking the question took the opportunity to go off on a long, unrelated tangent. For the most part, this advice was followed at PyCon: I didn't see very many long winded questions or statements during Q&A sessions.

Machete-mode Debugging

(abstract; video)

Ned Batchelder gave this great talk about using python's language features to debug problematic code. He ran through several examples of tricky problems that could come up, and how to use things like monkey patching and the debug trace hook to find out where the problem is. One piece of advice I liked was when he said that it doesn't matter how ugly the code is, since it's only going to last 10 minutes. The point is the get the information you need out of the system the easiest way possible, and then you can undo your changes.

Refactoring Python

(abstract; video)

I found this session pretty interesting. We certainly have lots of code that needs refactoring!

Security with object-capabilities

(abstract; video; slides)

I found this interesting, but a little too theoretical. Object capabilities are a completely orthogonal way to access control lists as a way model security and permissions. It was hard for me to see how we could apply this to the systems we're building.

Awaken your home

(abstract; video)

A really cool intro to the Home Assistant project, which integrates all kinds of IoT type things in your home. E.g. Nest, Sonos, IFTTT, OpenWrt, light bulbs, switches, automatic sprinkler systems. I'm definitely going to give this a try once I free up my raspberry pi.

Finding closure with closures

(abstract; video)

A very entertaining session about closures in Python. Does Python even have closures? (yes!)

Life cycle of a Python class

(abstract; video)

Lots of good information about how classes work in Python, including some details about meta-classes. I think I understand meta-classes better after having attended this session. I still don't get descriptors though!

(I hope Mike learns soon that __new__ is pronounced "dunder new" and not "under under new"!)

Deep learning

(abstract; video)

Very good presentation about getting started with deep learning. There are lots of great libraries and pre-trained neural networks out there to get started with!

Building protocol libraries the right way

(abstract; video)

I really enjoyed this talk. Cory Benfield describes the importance of keeping a clean separation between your protocol parsing code, and your IO. It not only makes things more testable, but makes code more reusable. Nearly every HTTP library in the Python ecosystem needs to re-implement its own HTTP parsing code, since all the existing code is tightly coupled to the network IO calls.

Tuesday

Guido's Keynote

(video)

Some interesting notes in here about the history of Python, and a look at what's coming in 3.6.

Click

(abstract; video)

An intro to the click module for creating beautiful command line interfaces.

I like that click helps you to build testable CLIs.

HTTP/2 and asynchronous APIs

(abstract; video)

A good introduction to what HTTP/2 can do, and why it's such an improvement over HTTP/1.x.

Remote calls != local calls

(abstract; video)

Really good talk about failing gracefully. He covered some familiar topics like adding timeouts and retries to things that can fail, but also introduced to me the concept of circuit breakers. The idea with a circuit breaker is to prevent talking to services you know are down. For example, if you have failed to get a response from service X the past 5 times due to timeouts or errors, then open the circuit breaker for a set amount of time. Future calls to service X from your application will be intercepted, and will fail early. This can avoid hammering a service while it's in an error state, and works well in combination with timeouts and retries of course.

I was thinking quite a bit about Ben's redo module during this talk. It's a great module for handling retries!

Diving into the wreck

(abstract; video)

A look into diagnosing performance problems in applications. Some neat tools and techniques introduced here, but I felt he blamed the DB a little too much :)

Wednesday

Magic Wormhole

(abstract; video; slides)

I didn't end up going to this talk, but I did have a chance to chat with Brian before. magic-wormhole is a tool to safely transfer files from one computer to another. Think scp, but without needing ssh keys set up already, or direct network flows. Very neat tool!

Computational Physics

(abstract; video)

How to do planetary orbit simulations in Python. Pretty interesting talk, he introduced me to Feynman, and some of the important characteristics of the simulation methods introduced.

Small batch artisinal bots

(abstract; video)

Hilarious talk about building bots with Python. Definitely worth watching, although unfortunately it's only a partial recording.

Gilectomy

(abstract; video)

The infamous GIL is gone! And your Python programs only run 25x slower!

Larry describes why the GIL was introduced, what it does, and what's involved with removing it. He's actually got a fork of Python with the GIL removed, but performance suffers quite a bit when run without the GIL.

Lars' Keynote

(video)

If you watch only one video from PyCon, watch this. It's just incredible.

MozLando Survival Guide

MozLando is coming!

I thought I would share a few tips I've learned over the years of how to make the most of these company gatherings. These summits or workweeks are always full of awesomeness, but they can also be confusing and overwhelming.

#1 Seek out people

It's great to have a (short!) list of people you'd like to see in person. Maybe somebody you've only met on IRC / vidyo or bugzilla?

Having a list of people you want to say "thank you" in person to is a great way to approach this. Who doesn't like to hear a sincere "thank you" from someone they work with?

#2 Take advantage of increased bandwidth

I don't know about you, but I can find it pretty challenging at times to get my ideas across in IRC or on an etherpad. It's so much easier in person, with a pad of paper or whiteboard in front of you. You can share ideas with people, and have a latency/lag-free conversation! No more fighting AV issues!

#3 Don't burn yourself out

A week of full days of meetings, code sprints, and blue sky dreaming can be really draining. Don't feel bad if you need to take a breather. Go for a walk or a jog. Take a nap. Read a book. You'll come back refreshed, and ready to engage again.

That's it!

I look forward to seeing you all next week!