The last 10% is usually something like search; it is the novel algorithm that takes all your data and provides it in a meaningful way that makes your product awesome. CouchDB rarely solves this (neither do other packages). The more special the algorithm is, the more painful it will be to try to solve with CouchDB's MapReduce framework alone.
Fortunately, CouchDB has replication built in. I use the replication to push data from CouchDB to a custom server where I aggregate it into a meaningful service. The library is called Otto (short for ottoman).
The biggest problem you are going to have is what happens when your custom server crashes?
This can be solved by
- providing your own persistence, and deal with reliability
- not worrying about it and launch 3 servers with a custom HTTP server that replicates three ways; spend more money.
- don't care and re-replicate the entire data set, and have potentially non-trivial down-time.
From a complexity standpoint, you can make your life easier by enabling your custom software to merge in bulk sets. This enables you to lazily run your algorithm as you collect a lot of data at once. I have found that this style of bulk insertions makes the third option feasible in many domains.
The nice thing about using CouchDB is that you don't need a schema. You don't need to "plan". If you need store data, then you just store it. Just give it a namespace and insert.
Please comment if you think I should write a book on CouchDB? I've done relational databases for years, and I've lurked in the CouchDB for a while. I'm currently building a web framework around node.js and CouchDB called WIN.
No comments:
Post a Comment