Karmona's Pragmatic Blog

Don't get overconfident… Tiny minds also think alike

Karmona's Pragmatic Blog

Chubby Hubby

August 10th, 2009 by Moti Karmona | מוטי קרמונה · No Comments

Chubby Hubby

Recently, I have encountered an interesting paper (2006) about Chubby – Google’s (Paxos based) distributed lock service.
I was especially amazed by the observations made on the Google engineering capabilities and mindset inside a “formal” research publication.

Although one can easily get into a cynical state of mind reading this paper… I feel that this “pragmatic view” which combines a deep architectural and algorithmic know-how with keen understanding of the social factor in software development is exactly the key to create legendary software.

Anyway, very well written – highly recommended reading…

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

“Our developers sometimes do not plan for high availability in the way one would wish. Often their systems start as prototypes with little load and loose availability guarantees; invariably the code has not been specially structured for use with a consensus protocol. As the service matures and gains clients, availability becomes more important; replication and primary election are then added to an existing design.”

“Developers are often unable to predict how their services will be used in the future, and how use will grow.  A module written by one team may be reused a year later by another team with disastrous results … Other developers may be less aware of the cost of an RPC.”

Despite attempts at education, our developers regularly write loops that retry indefinitely when a file is not present, or poll a file by opening it and closing it repeatedly when one might expect they would open the file just once.”

Developers rarely consider availability. We find that our developers rarely think about failure probabilities.

Developers also fail to appreciate the difference between a service being up, and that service being available to their applications.

“Unfortunately, many developers chose to crash their applications on receiving [a failover] event, thus decreasing the availability of their systems substantially”

Tags: Development · Google · People · Software

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment

Allowed tags <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>