Phillip M. Alday
  • Welcome
  • Blog
  • Statistics and R
  • About me
  • blog

As part of the migration to quarto, the blog has moved to a URL structure. The old URLs should still work, but will be static renders of the old HTML and not reflect the new design.

Best Software Practices for Science / Scientific Computing

automation
code
documentation
reuse
software
version control
Published

06 October 2012

This was originally posted on Blogger. Comments were not migrated.

I highly recommend this:

http://software-carpentry.org/2012/10/best-practices-for-scientific-computing/

The most important tips that I would like to see my own group use more of are (abridged from link):

  1. version control (with modern DVCS, there’s no reason not to have even your little scripts under version control)
  2. automate repetitive tasks and use the computer to record (command) history (I think these two really go hand in hand with each other and with #1)
  3. Don’t repeat yourself (or others).
    1. Every piece of data must have a single authoritative representation in the system. 
    2. Code should be modularized rather than copied and pasted.
    3. Re-use code instead of rewriting it
Copy and pasting leads to the code blocs getting out of sync, i.e., inconsistent analyses. And I can’t tell you the number of times I’ve found a mess of  inconsistent scripts and literally hundreds of gigabytes of duplicated data, with no single copy “authoritative”. (Luckily, in the last case, SHA1 revealed that the individual data files were identical; however, each set had a slightly different collection of files…). And if you do right from the beginning, it doesn’t even take that much time!

I think all of this can be summarized into two points:
  1. Use version control 
  2. Use good coding/documentation practices, including modularity
Back to top