Fortunately, thousands of biology-related datasets are publicly available on the Web. Commonly, researchers want to filter individual datasets to include biological samples that meet specific criteria and to examine data for a few variables (e.g., genes, proteins) at a time. Unfortunately, researchers often have a difficult time performing these tasks because many datasets are large in size, and they are stored in a wide variety of formats.
To address this problem, the Piccolo lab is developing Geney, a Web-based tool that will enable researchers to query such datasets in a consistent and easy manner. In addition to Geney, we are developing WishBuilder, a system that enables datasets to be imported into Geney, irrespective of the format in which the data were originally stored. This system downloads data from public Web servers, reformats the data, and stores it in a consistent, queryable format. To facilitate this process, we are asking for help from BYU students to write computer scripts for preparing the data. The more datasets we prepare, the more useful Geney will be!
You can see a prototype of Geney here.
To learn more about WishBuilder and how to contribute datasets/start a pull requests, visit the WishBuilder Wiki