Best Software Practices for Science / Scientific Computing

automation

code

documentation

reuse

software

version control

Published

06 October 2012

This was originally posted on Blogger. Comments were not migrated.

I highly recommend this:

http://software-carpentry.org/2012/10/best-practices-for-scientific-computing/

The most important tips that I would like to see my own group use more of are (abridged from link):

version control (with modern DVCS, there’s no reason not to have even your little scripts under version control)
automate repetitive tasks and use the computer to record (command) history (I think these two really go hand in hand with each other and with #1)
Don’t repeat yourself (or others).

Every piece of data must have a single authoritative representation in the system.
Code should be modularized rather than copied and pasted.
Re-use code instead of rewriting it

Copy and pasting leads to the code blocs getting out of sync, i.e., inconsistent analyses. And I can’t tell you the number of times I’ve found a mess of inconsistent scripts and literally hundreds of gigabytes of duplicated data, with no single copy “authoritative”. (Luckily, in the last case, SHA1 revealed that the individual data files were identical; however, each set had a slightly different collection of files…). And if you do right from the beginning, it doesn’t even take that much time!

I think all of this can be summarized into two points:

Use version control
Use good coding/documentation practices, including modularity