By Jeffrey M. Perkel
The First DNA Sequence Database
|
|
|
COURTESY OF GREG HAMM
|
In the middle of 1981, Greg Hamm was a 30-year-old software programmer newly hired by the European Molecular Biology Laboratory to head up its DNA data library-a database that did not yet exist. So he set about making one. "We had journals publishing sequence data in increasingly small point size type, which was useless," he says. "It was clear that one thing that was needed was a transmission format, a way to send the data from one place to another."
Rather than limiting the system to the hardware at hand, Hamm decided to adopt an "archaic" file format that could be read by relatively simple and sophisticated systems both. "The decision I made was to reach backward in the history of computing to come up with a format," he says. "The idea was, if you had a very sophisticated environment, this format should be easy to parse. If you had a relatively old-fashioned technical environment you could use this as well."
Shown here is a "very early sketch" of Hamm's suggested format. Each line of the file contains specific information, tagged with a two-character identifier, such as DT for date and FT for feature table. Though some things have changed - this drawing doesn't include an accession number field, for one thing - EMBL's data format (the first EMBL record - accession #X0001 - is shown in the inset) remains remarkably the same as Hamm envisioned it 25 years ago.
Advertisement
Rate this article
- Not currently rated. Be the first!
- 1
- 2
- 3
- 4
- 5
Not currently rated. Be the first!
|