In 2010, four men began preparing for what they thought would be the exciting, and imminent, future of genetic analysis in the Middle East.
Egyptians Mohamed Aboueldhoda, Moustafa Ghanem, Mohamed El-Kalioby and Sameh El-Ansary are the founders of EG-Bioinformatics. It was a company intended to supply the analytical horsepower for the huge volumes of data generated by Next Generation Sequencing (NGS): the technology that identifies and records, in order, the components that make up DNA.
But despite the company’s relevance in the fast-growing US and Europe markets, especially after Angelina Jolie’s use of the technology to discover an inherited ‘breast cancer gene’ in 2013, MENA wasn't ready for this kind of data-driven medicine.
When EG-Bioinformatics launched in 2011, it was into an embryonic market without customers or a need for bioinformatic analysis. The company was put on hold in 2013.
But CEO Aboueldhoda, a computer scientist and biomedical engineer, says they’re starting to regroup again in anticipation of the hoped-for boom in one to three years’ time.
The founders, now reduced to three after the death of Ghanem, are tinkering with the business model and, most importantly, are keeping a close eye on developments in NGS machines and the analytical challenges posed by devices that can now produce up to 30-50 terabytes of data - every week.
Wamda spoke to Aboueldhoda to find out what went wrong, why it’ll work next time, and how they’re improving the software to cope with the ever-increasing volumes of data produced by NGS technology.
Wamda: What was EG-Bioinformatics supposed to do in the beginning?
Mohamed Aboueldhoda: EG-Bioinformatics was to provide bio-analysis services to the life science sector, such as the medical profession or biologists looking at plants to find new types that can be more resistant to droughts.
We thought that we could provide genomic data analysis to this community in the Middle East.
Wamda: Why didn’t it work?
Aboueldhoda: The problem was the timing. EG-Bioinformatics started in 2010 and we started operations in 2011, because we wanted to be ahead when NGS technology arrived.
The Middle East market was not ready because the new version of genome sequencing technology - NGS - did not arrive on a large scale at that time. This means that no significant data was produced in the Middle East, and accordingly no need for data analysis services. We were too optimistic that the Middle East would adopt this technology earlier.
Wamda: Why was the technology delayed?
Aboueldhoda: At that time there were two factors. First the awareness took time, and also the high cost.
But this technology keeps improving, so what you get today in 2015 is maybe ten times better than that produced in 2012, but with much reduced costs. For example, in 2010 NGS technology was predicted to lower the cost of sequencing to about $1000 per human genome and that only happened only this year.
The good news is that this year (institutions in Egypt, Saudi Arabia, the Emirates and Qatar) have started to acquire genome sequencing machines. So far all of these machines are used for research purposes and none for (commercial) services, but what I expect is they will move into the service market.
In my view, the high time to start this business again is now.
Wamda: How are you altering the business model to keep up with market needs?
Aboueldhoda: The idea now is not to provide big analytic services but to partner with life scientists from beginning to end, especially in healthcare and diagnostics. Like in the US where many companies already provide end-to-end solutions where the customers provide their samples and at the end they deliver the customer with a report.
Currently there are no such companies (in MENA) offering these services, and we would be the pioneer.
Wamda: The tools EG-Bioinformatics initially offered were based on your research into market needs in 2010. What does the market need now in terms of genomic analytics software?
Aboueldhoda: The current requirement is the ability of the software to cope with even larger data sets.
By end of last year, Illumina (an NGS device seller) announced a new machine that is capable of sequencing 18,000 genomes a year. This necessitates the use of much more fine-tuned algorithms running on high performance computing architecture. Globally, current software has not yet been tested at that scale in terms of accuracy and speed. This opens a new niche in the market for who would succeed in providing the best solution.
Also, a big focus should go towards data management to be able to organize the large data sets, archive it and retrieve it efficiently.
Wamda: The use of ‘big data’ is the key element of the EG-Bioinformatics business model. What systems do you use to manage it?
Aboueldhoda: At that time, we were the leaders in utilising cloud computing resources. Our research now is not just using cloud, but on how to optimize the use in terms of cost.
Next Generation Sequencing technology produces large files of characters representing the genome of the person. The file size for the human genome, for example, ranges between, depending on which features we look at, 50 gigabytes to 150 gigabytes.
Our research at Nile University was focusing on reducing the cost of analysis and we now put the cost at about $50 per sample. We reduced the cost to between $50 to $100 purely by using the cloud.
Because the file size is huge, usually the raw data is deleted after a certain time period and only results are kept. Archiving the data in the cloud can be alternative, but a cost is involved.
We deal with cloud resources because no in-house IT resources are needed, and the use of cloud computing solves the scalability problem as one can allocate as many computing machines as required.
Wamda: Can you explain the process you use to analyse data from a DNA sample?
Aboueldhoda: It's a combination of data curation, data analysis and comparing that to a large number of databases. The number of databases we need to investigate is about 100.
In data analysis, the concept of 'one-size-fits-all' does not work and fine-tuning is required for each analysis task. To overcome this, we provided at that time a workflow management system to support researchers compose their analysis tasks without any programming. Life scientists with no computing background can also do this.
The challenge now is how to compose a useful pipeline of content that won’t take up too many computing resources.
Wamda: What challenges come with dealing with large data volumes in MENA?
Aboueldhoda: The bottleneck in MENA is still in the data production: how people can generate high quality data from this tech. Once this is solved other discussions will follow around privacy and security, but these issues are in the future. The first step is to have the capacity, in machines and human resources to generate data at a high quality.
Low quality genomic data comes when something wrong goes in the lab when preparing the sample and the related libraries to sequence the genome. This can be detected compared to known results of high quality data.
Another challenge is also how to transfer the data to the cloud. This is a serious challenge for the Middle East due to the lagging networking infrastructure compared to what exists in US and Europe.
Wamda: Are MENA labs reaching a stage where they can generate large amounts of high quality data?
Aboueldhoda: Yes. This is happening because the new technology is easier to use and this is also why it is attractive to people. The machines are useable and the cost is much less (than they used to be).
In my view it will take one to three years before the market is ready to handle such things.
Wamda: What will it take to get the genetic-testing market off the ground in MENA?
Aboueldhoda: Investors must become interested.
The best way is to first raise awareness of genomic-based diagnostics; the US is very advanced in this type of business and similar ideas can be brought here as well. The ideal partnership would be partners from life science and IT, along with investors seeing the potential.