Sunday, April 17, 2016

Guy accidentally deletes his entire company with just one line of bad code



FIONA MACDONALD
15 APR 2016

If you're feeling a little down today, spare a thought for poor Marco Marsala, a web hosting provider who accidentally deleted his entire company with a single line of bad code this week. 

By entering code telling his computer to delete everything, Marsala also inadvertently wiped everything on his servers - including all the offsite backups. He posted the dilemma on tech forum ServerFault where it quickly became apparent that there was no way to reverse what he'd done. In other words... damn.

"I run a small hosting provider with more or less 1,535 customers and I use Ansible to automate some operations to be run on all servers. Last night I accidentally ran, on all servers, a Bash script with a rm -rf {foo}/{bar} with those variables undefined due to a bug in the code above this line," Marsala posted on ServerFault on April 11.

Within 20 minutes, he got his reply - although it probably wasn't the answer he was looking for.
"If you really don't have any backups I am sorry to say but you just nuked your entire company," a user named André Borie replied. 

A few weeks ago we heard about a programmer who almost broke the internet by deleting 11 lines of code. But that problem - although more far-reaching in scale - was fixed up within hours. So what did Marsala do that was so irreversibly terrible?

The command he used was "rm -rf", which pretty simply deletes everything it's told to.
"The 'rm' tells the computer to remove; the r deletes everything within a given directory; and the f stands for 'force', telling the computer to ignore the usual warnings that come when deleting files," explains Andrew Griffin for The Independent.
 
That command itself isn't the problem, but the line of code that Marsala had above it was buggy, and so left the variables - which should have told the command where to stop and start deleting - undefined. 

Because the backup drives were also mounted to his computer before he ran the script, they got wiped too. 
Although there's some very small hope Marsala might be able to get some of his users' data back, things don't look good for the poor guy.

"I feel sorry to say that your company is now essentially dead," wrote a user called Sven. "You might have an extremely slim chance to recover from this if you turn off everything right now and hand your disks over to a reputable data recovery company. This will be extremely expensive and still extremely unlikely to really rescue you, and it will take a lot of time."

"You're going out of business. You don't need technical advice, you need to call your lawyer," another user responded.
Of course, Marsala isn't entirely the victim in this situation. As many of ServerFault's users pointed out, he had left himself open to this by not properly backing up his clients' data properly. 

"Well, you should have been thinking about how to protect your customers' data before nuking them," wrote one person calling himself Massimo. "I won't even begin enumerating how many errors are simultaneously required in order to be able to completely erase all your servers and all your backups in a single strike. This is not bad luck: it's astonishingly bad design reinforced by complete carelessness."

The lesson to be learnt from all of this is that backups are important, and they need to be offsite, offline, and incremental.

Also, maybe test your commands in a safe space before you run them everywhere.
"Never run a command everywhere at once. Separate out test and production machines, and preferably do production machines in stages. Its better to fix 1 or 10 machines rather than 100 or 1,000," suggested a user called Journeyman Geek. 

Let's all just be glad this bad code didn't affect our favourite websites. And if nothing else, rest assured it's not the type of mistake you'll make twice.

No comments:

Post a Comment