Awhile back I put together a fun list of the top 7 data scientists before there was Data Science. I got some great feedback on others that should be on the list (Tukey, Hopper, and even Florence Nightingale).
In hindsight I probably should have also included Edgar Codd. While at IBM, Codd developed the relational model for databases (data bank was the older term). You can see his 1970’s paper here
While the current excitement is around Big and unstructured data, most modern databases in commercial use today, along with the query language they use, SQL, are due to Codd’s work. Codd put together 12 or so rules to define when a database system can claim to be a relational database. One of these primary rules is the rule of Guaranteed Access. Essentially, the rule states that each unique data element has a unique address. This allows us, the user, to specify and select each atomic element and grab it if we want. Which, really, is the whole purpose of having a database.
The rule of Guaranteed Access got me thinking though. Maybe the current attempts to define Big Data in terms of quantity, or scale, or the value of data may be the wrong way to go about it. Rather than think about Big Data as a big collection of stuff, it may make more sense to think of Big Data as a capability we have about addressing and accessing data.
BIG DATA as a modified version of the Guaranteed Access Rule
“Big Data is the capacity to address and access any datum or event to any arbitrary scale or level of precision”
By framing it this way, we shed the ambiguous notions about quantity, size, and even value, and instead treat Big Data as a limiting process about our capacity to communicate with, and use, any present event or historical record of past events. In this alternative view, the metric for progression in the field stops being about the very bigness of data, but rather about the very smallness of the scale with which we can address and access data.
Please feel free to comment below with your thoughts!