Google: ‘At scale, everything breaks’

June 23, 2011 | Source: ZDNet

Behind the scenes, Google is fighting a constant battle against the twin demons of cascading failovers and the increasingly challenging levels of complexity that massively scaled services bring.

‘At scale, everything breaks, and Google must walk a tightrope between increasing the scaling of its systems while avoiding cascading failovers, such as the outage that affected Gmail in March this year,” says Urs Hölzle, who was Google’s first vice president of engineering.

“Keeping things simple and yet scalable is actually the biggest challenge. It’s really, really hard. Most things don’t work that well at scale, so you need to introduce some complexity, but you have to keep it down.

“I think the big challenges haven’t changed that much. I’d say that it’s dealing with failure, because at scale everything breaks no matter what you do and you have to deal reasonably cleanly with that and try to hide it from the people actually using your system.”