Following Kurt’s response to Monica’s question, here are my thoughts on choosing a programming language as a scientist.
During the last 10 or so years I have used and tested quite a good number of programming languages and software packages. I like learning new programming languages, and I like testing new software, perhaps more than I should. Anyway, first let me show what I consider important in a programming environment, in decreasing order:
- is the software free? can I install it anywhere, whenever I need?
- how easy it is to use the software? what is the ratio of time spent in the language, instead of spent on the problem I’m trying to solve?
- how supported is the language? does it have a nice community? does it have useful libraries?
So, regarding #1 I never use commercial software. Ever. I think science should be open to inspection and universally reproducible, so everything I do is with free, open-source software. Also, my ability to produce science is my greatest asset, so I would never tie it to a commercial software. I’ve heard a couple of stories of people who stayed unproductive for months until they could get a Matlab license, eg.
Easy of use (#2) is also important for me. I love Fortran because it’s oh-so-fast, but I also hate Fortran because so many times I’m fighting against the language, and spending time with something other than the problem I want to solve. I like software and programming languages that don’t stand in my way when I’m looking at a problem.
For #3 I’ve been less rigorous in the past, when I had more time. During my PhD I used to rewrite Matlab functions for Octave, or translate code from the Numerical Recipes into Python. These days I want a programming environment with a good assortment of statistical functions, eg, instead of having to write my own.
So, what do I recommend? I think every scientist should know at least a general programming language. Matlab is ok, but sometimes you want to some file processing, or download data from the web, and then it gets really clumsy. I’ve never used R, but I think it’s more a specific language focused on statistics, ie, not generic enough.
Fortran is a good language: it’s fast, has modern implementations and, depending on the field, is widely used. But for exploratory research I think than an interpreted language is better, where you can introspect data and test code snippets, small functions and quickly see the result. C/C++, in my opinion, fall in the same group as Fortran, except for being slower and more hindering.
Between Perl and Python, as Monica puts it, I think Python is a much better choice. The code is clearer, and the “there’s only one way to do it” philosophy helps a lot. Python is used in a good number of scientific projects, and has a wonderful collection of modules (Matplotlib, Scipy, Numpy, Pyclimate…) that fulfills all my needs.
Python has come a long distance in the last few years, and today I can do most of my coding with it, from data retrieval to analysis to plot generation. It’s fast enough for most needs, and when I need there’s Numpy/Scipy, or I can write my critical functions in Fortran and call from my Python script. Python also made me a better programmer, teaching me to write good and concise code.