2007-06-02

Bio::Blogs 11 - Tips for bioinformatics practice

EN: It is again the beginning of a month so time for a new issue of Bio::Blogs. The 11th edition is hosted on nodalpoint (pdf-version) and has the usual best-of-bioinformatics-blogs of the previous month (thanks Pedro!). Additionally a collection of tips with focus on best practice in bioinformatics can be found on Bioinformatics Zen (pdf). Michael Barton did a great job as an editor of this nicely designed collection. I contributed some recommendations, too and post below my raw version as it contain the links. Michael did some modification/cleaning to fit it into the style of the collection so there might be some small differences.

Here are some of my lifehack tips for computational biologists based on my personal experience. Please keep in mind that my main task is the data analysis itself, not the general tool development. Some of the points are not so relevant for people mainly developing software.

Maybe you are already inspired by general classical literature such as "Getting things done", "Seven habits of highly effective people" and the new kid on the block "The-4-hour-work-week" and are following blogs such as lifehacker.

Set up a proper tool box containing ...
  • ... a programming language (or several) that suits your needs and lets you write functional and readable code quickly, e.g. Python
  • ... the knowledge of the most important/used Unix commands
  • ... a powerful editor (hot tips here: syntax-highlighting and completion)
  • ... a reference management system like Connotea/citeulike or JabRef
  • ... a browser equipped with handy extentsions
  • ... a proper version control system for your code and documents - CVS or Subversion might be the options that come to your mind first, but if you would ask Linus Torvalds, he would recommend git.

The habit of proper documentation
  • Think about your future ego that will thank you for the possibility to travel back in time.
  • Write a lab journal/book
    • I highly recommend a blog or wiki based system, but for sure flat files or even a hand written notebook *cough*no*couch*way!!!*cough* might do the job.
    • Write down what you did, how you did it, why you did it, and where the input and output are stored. Also some of your key graphics, article summaries, and overviews of the results should be placed in there.
    • By articulating the thoughts in your head you define aims and become aware of them ("Begin with the end in mind" as Stephen R. Covey says).
  • Document your code
    • A short description at the beginning of the program is the minimal documentation.
    • Give meaningful names to functions and variables. If you cannot find a proper name, you don't know what the thing really does.
  • Use clear file names, file headers and folder structures to find things easily
    • I do this by putting the date in the beginning of folders where I organise my analyses e.g. 2007-05-28-foo_analysis_in_bar_data.
    • I put a master shell script into these folders which contains all the calls and sometimes, as a comment, the names of the resulting files.
    • The output files of my programs have a short header that tells by which program they were created, at which time and with which parameters the program was called (I use a little library for that).
Empathy and compassion
  • Think about which information in your head could be needed by other people in your lab or in your scientific community. A wiki for example can be an excellent tool for organizing generally needed knowledge and avoiding the drain of it if somebody leaves the lab.
Other small tips:
  • Be aware that demands will change so program from the beginning on in a way that make it easy to extend/adapt the program.
  • As soon as there is the smallest possibility that a function might be used twice write a proper library.
  • Often flat files are a handy format for data, and grep and others are good companions. If data sets are constantly changing and built of many components a proper database might be a better solution. For many purposes you don't need a client-server-based database like MySQL but a simple library based one e.g. SQLite that does the job without server set up and user management.
  • Code your graphics, as this will give you more control. Especially if you have graphics that consist of different panels, it will save you a lot of time if changes will be needed (and this will be the case). I personally create primarily SVG files and convert them to other formats if needed.
  • If you set up a web-server think not only about functionality but also about security. Make the internet a safer place and don't feed the spam/botnet industry with your server.
  • Regarding sustainability: We are not evolved to work for hours in the same position in front of a computer. Know when to take breaks, stretch, and try to focus as often as possible at distant objects. Hopefully this will help to maintain proper eyesight long enough until neuronal interfaces are on the market.

Keine Kommentare: