Perl, Bioinformatics and the Ensembl Project

30 minutes



Bioinformatics is the application of computation to biology. Its aims are to help better understand and inform human health, combat issues of an ageing population, help with food security and to further understand climate change. We do so by analysing complex biological systems through many different technicals not limited to wide scale population genome analyses, 3D protein structures modelling, systems biology and mining scientific literature. Since its early days Bioinformatics has been heavily reliant on Perl and its powerful text manipulation helped by the most basic representation of genomes as the 4 nucleic acids ACGT. Perl was an obvious choice.

The Ensembl Project (, currently hosted at EMBL-EBI, was founded in 1999 and was built to interpret genomic data being generated by the international Human Genome Project. Ensembl has been built with Perl as its primary programming language. It possess one of the most widely used biologically focused APIs powering our analysis pipelines predicting the location of genes and their control mechanisms, finding relationships between closely related species, predicting the consequence of differences between people and finally powering our website and RESTful API. We are a completely open source project ( and provide all data produced by the project free of charge.

Here we present a brief summary of the bioinformatics domain, EMBL-EBI, the Ensembl Project, tools developed by Ensembl and how these will impact on our everyday lives. We will also present our pipeline management system eHive; a tool currently used to schedule over 420 CPU years of compute in a single year. 

[ Abstract ]