Friday, November 26, 2010

Big Data Enables Agile Data

The funny thing about NoSQL is that it is being solved and addressed by the Big Data and Scalability communities where there are legitimate problems of scale that are very difficult yet it enables Agile Data.

Here, I define Agile Data as:
  • the ability to record all available data at the point of a transaction/form/user interaction (including a context)
  • organize data after data is available
It should be clear from this description that an RDBMS is not Agile in this sense as it requires me to organize data before I collect it. Sure, there is a way to achieve the above with a RDBMS and you could develop a methodology or an engine to accomplish it, but that violates the spirit of an RDBMS since I would just be packing JSON objects into a row.

The perfect example of a Big/Agile data problem is that of analytics. I would like to record as much information is available (the http headers, the client data, maybe some page content, etc). Sure, I could build a structure/schema to try to solve the problem that I think will be valid, but then I'm potentially reducing the amount of information I'm gathering. Instead, I take the mantra of "gather everything", I finish the collection faster and can start studying the data to look for interesting patterns.

The really neat thing about having Big Data at my disposal is that Agile data introduces Big Data problems since storage requirements grow a lot faster than with a typical solution.

No comments:

Post a Comment