If you are a neuroscientist, whether your day-to-day involves running rats through a radial arm maze or changing the medium in your stem cells, you might not have had much experience programming. If you happen to work in genetics, you might have brought some data to a statistically or computationally inclined person in your lab, or you might be able to use the browser-based tools so useful to benchside geneticists today: BLAST, UCSC Genome Browser, etc.
However, many scientists want to learn to code for themselves but don’t know where to start. Only very rarely will a young neuroscientist’s formal training include programming courses (I know mine did not). But the immense advantages available to those who can code are clear, for both big scary reasons and small mundane reasons:
Small mundane reasons:
- Being able to program at a basic level makes data analysis and statistics immensely easier to perform and opens up the possibility to automate analysis, which both decreases the time spent and reduces the possibility of human error (so long as the program is written correctly!).
- It’s a huge advantage generally in professional life. Programming skills look good on a CV, and if you learn web-based languages, you can become an extremely effective communicator by building your own websites. Also, being able to fluently generate graphics (for example in R) will make your presentations stand out from those created in default settings of Microsoft Office!
- Fluency in the language of programming will help you collaborate with computational scientists even if you decide not to become a bioinformatician.
Big scary reasons:
- Data collection is becoming more and more automated. Kits of pre-made reagents and tubes nullify a lot of the necessary work that provided tasks for scientists and techs. Robots are replacing people as the scale of research continues to escalate. This means fewer job opportunities for hands-on wet research.
- As a result of this massively increased output, the size of data is becoming far too large to analyze by hand. This gives scientists who can competently analyze data in a computational pipeline a huge edge.
“Okay Ryan”, you say, “I understand that learning to program can be extremely useful. I already got this far in the article without clicking away. How do I DO it though?”
Programming is a skill like any other, and as such, the only way to improve is to practice. Luckily, there are several places to start learning a language. Advice before you start: this is about learning languages. Just like trying to speak Greek, it is unrealistic to think that you can become a pro in a few hours. These are skills that have a very high ceiling, and you can work daily for years and still not learn everything. However, like knowing a foreign language, learning to program opens up many horizons that were impossible before!
Choosing which language to learn depends on your goals. For the computer-savvy scientists, I see three major routes. The first is to embrace computational biology, which I won’t cover here. The second is to adopt a language that gives you the ability to analyze and transform data and run basic statistics. The third is to develop a web presence or browser-based tools to better communicate your results.
If you want to manage basic statistics and data management, I highly recommend either R (my preference) or Python. You can get started with R by using the interactive R package Swirl--all the details on how to use it are at swirlstats.com. R is an open-source language with enormous amounts of user-created content for countless applications, including high-level statistics, bioinformatics, and graphing. You can learn Python (along with several other languages) at codeacademy.com. Python features easy-to-use and easy-to-read syntax, making it a good bet for the aspiring programmer.
These are all just starting points on the journey, but forums and blogs on these sites should point to where to go next. Give it a try and write me (email@example.com) if you run into trouble. I only started a few years ago, so I am also a novice in this field.
Ryan Price is a fellow at the Laboratory of Neurogenetics at the National Institutes on Aging, National Institutes of Health. His research involves genome-wide study of neurodegenerative diseases.