I released the first version of the highly untested otto today.
Otto is a node.js server that enables CouchDB to replicate to it. That is, changes on the CouchDB machine will replicate to node.js via CouchDB's replication stack. Yay!
Clone it today on GitHub.
Monday, October 18, 2010
Friday, October 15, 2010
NoSQL Design; a primer for future data architects
*This is a rough outline of a book I'm working on; after writing this, I realized that there is a lot of knowledge debt and background understanding.*
Having done RDBMS/SQL based design for the past 10 years, I've been skeptical about how NoSQL works and impacts how we store and query data. For the past two years, I've engrossed myself in using MySQL to emulate and figure out NoSQL design patterns.
Rule 1: Denormalize like a Crazy Person
Normalization in Databases is a side-effect of using Databases. If you buy in to using a Relational Database, then you buy into normalization strategies. In Relational land, normalization allows you to use SQL as a rich query language. For NoSQL, you must denormalize and think in fewer "tables". Think in terms of documents. How can one document store a lot of related data?
Example
Serialize your user object into a JSON string and ship it off to a NoSQL solution.
Rule 2: Embrace the "Sea of Shit" (i.e. schema-less design)
Just like how dynamic typing makes my ass itch years ago, so does schema-less design. However, the mode of thought is to think document versioning and namespace partitioning. That is, give every document a field called "class_type" and a field call "class_type_version", and then use them in the obvious ways.
The consumers (developers and yourself) of your data should understand that the schema has multiple versions and have a way to gracefully degrade or be able to initiate a remote upgrade. Alternatively, there could be an upgrade script that does this, but I find that doing it lazily works well if you find the discipline to control and work against versions.
Rule 3: Dominate Complexity with a Dominating Subset Index
Complexity sucks, and the ultimate goal is to get complexity down to O(1) or O(f) where f is linear to the output of the page (and at most sub linear to the entire data set). While half of your view code will be dedicated to viewing single documents, the other half is aggregated/indexed sets of documents. Anything is special case, and handled by replication.
Conquering this requires thinking in Dominating Subsets where you place an index (and store the index as a document) on your documents and you have some efficient way of bring the index (or a subset) to a developer. This is where you do the dreaded join in application logic, but it will be ok as long as the complexity of the join is related to the output of the page. Relax, it will be ok.
Rule 4: Replicate like a Pirate
Disk space is cheap, and memory is getting cheaper. Unless you are google, then a single server can solve just about any problem you have if you can just get the data to it. With node.js or node.ocaml, it is feasible to build the services that drive the business in a customized fashion. Once you get past the single server, the service based design becomes its own challenge. However, it is now isolated from the rest of the ecosystem and can be measured and monitored independently.
Rule 5: Cache, Cache, and then Cache some more. Invalidate!
Fundamentally, you could cache everything forever with no time stamps if you just knew how to recompute the caches based on what you update. This is a fundamentally difficult problem, but it can be easily figured out with dependency graph of how your reads depend on your writes. It sounds simple, but it isn't at the application level.
Having done RDBMS/SQL based design for the past 10 years, I've been skeptical about how NoSQL works and impacts how we store and query data. For the past two years, I've engrossed myself in using MySQL to emulate and figure out NoSQL design patterns.
Rule 1: Denormalize like a Crazy Person
Normalization in Databases is a side-effect of using Databases. If you buy in to using a Relational Database, then you buy into normalization strategies. In Relational land, normalization allows you to use SQL as a rich query language. For NoSQL, you must denormalize and think in fewer "tables". Think in terms of documents. How can one document store a lot of related data?
Example

A user has multiple phone numbers, and you can represent this with a table consisting of tuples (user_id, label, number). Or, you can augment the user document with a field that stores in array of records. That is,
class Phone { string label; string number; }
class User { guid id; Phone [] phone; }
class Phone { string label; string number; }
class User { guid id; Phone [] phone; }
Serialize your user object into a JSON string and ship it off to a NoSQL solution.
Rule 2: Embrace the "Sea of Shit" (i.e. schema-less design)
Just like how dynamic typing makes my ass itch years ago, so does schema-less design. However, the mode of thought is to think document versioning and namespace partitioning. That is, give every document a field called "class_type" and a field call "class_type_version", and then use them in the obvious ways.
The consumers (developers and yourself) of your data should understand that the schema has multiple versions and have a way to gracefully degrade or be able to initiate a remote upgrade. Alternatively, there could be an upgrade script that does this, but I find that doing it lazily works well if you find the discipline to control and work against versions.
Rule 3: Dominate Complexity with a Dominating Subset Index
Complexity sucks, and the ultimate goal is to get complexity down to O(1) or O(f) where f is linear to the output of the page (and at most sub linear to the entire data set). While half of your view code will be dedicated to viewing single documents, the other half is aggregated/indexed sets of documents. Anything is special case, and handled by replication.
Conquering this requires thinking in Dominating Subsets where you place an index (and store the index as a document) on your documents and you have some efficient way of bring the index (or a subset) to a developer. This is where you do the dreaded join in application logic, but it will be ok as long as the complexity of the join is related to the output of the page. Relax, it will be ok.
Rule 4: Replicate like a Pirate
Disk space is cheap, and memory is getting cheaper. Unless you are google, then a single server can solve just about any problem you have if you can just get the data to it. With node.js or node.ocaml, it is feasible to build the services that drive the business in a customized fashion. Once you get past the single server, the service based design becomes its own challenge. However, it is now isolated from the rest of the ecosystem and can be measured and monitored independently.
Rule 5: Cache, Cache, and then Cache some more. Invalidate!
Fundamentally, you could cache everything forever with no time stamps if you just knew how to recompute the caches based on what you update. This is a fundamentally difficult problem, but it can be easily figured out with dependency graph of how your reads depend on your writes. It sounds simple, but it isn't at the application level.
Labels:
technology
Friday, October 1, 2010
dev install script
for my own reference
#!/bin/sh
apt-get install mercurial git-core ssh
apt-get install build-essential openssl libssl-dev
apt-get install inotify-tools curl zip unzip
apt-get install apache2 php5 php5-curl php5-tidy php5-memcache memcached php5-cli php5-gd
apt-get install libtool automake
apt-get install ocaml ocaml-native-compilers ocaml-findlib camlp5 libcurl4-openssl-dev
apt-get install libcamlimages-ocaml liblablgl-ocaml-dev libcurl-ocaml ocaml-libs
apt-get install libcamlimages-ocaml-doc omake
apt-get install ruby gem rubygems1.8
#!/bin/sh
apt-get install mercurial git-core ssh
apt-get install build-essential openssl libssl-dev
apt-get install inotify-tools curl zip unzip
apt-get install apache2 php5 php5-curl php5-tidy php5-memcache memcached php5-cli php5-gd
apt-get install libtool automake
apt-get install ocaml ocaml-native-compilers ocaml-findlib camlp5 libcurl4-openssl-dev
apt-get install libcamlimages-ocaml liblablgl-ocaml-dev libcurl-ocaml ocaml-libs
apt-get install libcamlimages-ocaml-doc omake
apt-get install ruby gem rubygems1.8
Labels:
technology
Thursday, September 30, 2010
Hack the Planet: How I solved Global Warming in my sleep
I'm a level 1 hippy, and I think the best way to save the planet (because that's the sexy thing to do these days) is to be conservative in what and how I consume. The problem is that requires changing a whole bunch of angry kids (Americans) minds about how to live. Instead, I wonder as I lay in bed wondering things that could be: what if I hacked the planet?
How could I make the world cooler? Well, we can't build a giant air condition because that's just heat exchange and it wouldn't work. Bummer. We need to hack the planet, and either vent excess heat or prevent heat from entering.
Simpsons did it.
So, how do we block out the sun? If we go far enough out into space, then the projected distance of whatever we build would leave a larger shadow. Think of an eclipse where the moon blocks out the sun. If we go out farther, then we don't need to build a moon. (Although honestly, we need to build a moon and control it; how cool would that be).
What if we built a disc out of aluminum foil (you know, to enable it to gracefully degrade to deal with all the junk in space that will hit it) and sent it out into space? Yay! How much would it cost? How big does it need to be? How far does it need to go? What evil plans could I unleash onto the world if I controlled the sun? Could we position it to allow us to effectively negate heat waves in large regions and control temperatures per city? It could pay for itself if an global auction took place to cool cities and save energy!
Well, a few Google searches away, I found this interesting document that I would like to bring to your attention:
And, low an behold, they did the math and it costs less than $1 billion dollars per year. Seems like a heck of a good investment. So, I'll post this on HN and maybe influence the movers and shakers to move and shake.
I on the other hand will go back to bed knowing there is viable solutions to global warming, so tomorrow I can be guilt free in building yet another web framework.
How could I make the world cooler? Well, we can't build a giant air condition because that's just heat exchange and it wouldn't work. Bummer. We need to hack the planet, and either vent excess heat or prevent heat from entering.
Simpsons did it.
So, how do we block out the sun? If we go far enough out into space, then the projected distance of whatever we build would leave a larger shadow. Think of an eclipse where the moon blocks out the sun. If we go out farther, then we don't need to build a moon. (Although honestly, we need to build a moon and control it; how cool would that be).
What if we built a disc out of aluminum foil (you know, to enable it to gracefully degrade to deal with all the junk in space that will hit it) and sent it out into space? Yay! How much would it cost? How big does it need to be? How far does it need to go? What evil plans could I unleash onto the world if I controlled the sun? Could we position it to allow us to effectively negate heat waves in large regions and control temperatures per city? It could pay for itself if an global auction took place to cool cities and save energy!
Well, a few Google searches away, I found this interesting document that I would like to bring to your attention:
Global Warming and Ice Ages
I. Prospects for Physics-Based Modulation of Global Change
And, low an behold, they did the math and it costs less than $1 billion dollars per year. Seems like a heck of a good investment. So, I'll post this on HN and maybe influence the movers and shakers to move and shake.
I on the other hand will go back to bed knowing there is viable solutions to global warming, so tomorrow I can be guilt free in building yet another web framework.
Labels:
personal
Monday, September 27, 2010
Node.js love (and other cool technologies)
I think node.js, redis, CouchDB are the best web technologies of this year that have matured from neat ideas to things I'm willing to put into production.
To the contributors, movers, and shakers of these awesome technologies, I salute you.
My prediction for 2011 is that node.js (with the help of redis or CouchDB) will over-throw Ruby on Rails. I could justify that statement, but I'm not going too; I just feel it in my bones. My bones tend not to lie about these things.
To the contributors, movers, and shakers of these awesome technologies, I salute you.
My prediction for 2011 is that node.js (with the help of redis or CouchDB) will over-throw Ruby on Rails. I could justify that statement, but I'm not going too; I just feel it in my bones. My bones tend not to lie about these things.
Labels:
technology
Thursday, September 23, 2010
php.js
massive library for JavaScript that ports a bunch of common php functions:
http://phpjs.org/
WIN is going to use
http://phpjs.org/
WIN is going to use
- addcslashesaddslashes
- get_html_translation_table
- html_entity_decode
- htmlentities
- htmlspecialchars
- htmlspecialchars_decode
- md5
- sha1
- mt_rand
- base64_decode
- base64_encode
- parse_url
- urlencode
- urldecode
- date
- date_default_timezone_set
- date_default_timezone_get
- strtotime
- time
- timezone_abbreviations_list
- utf8_decode
- utf8_encode
Labels:
technology
Tuesday, September 21, 2010
Technical Debt for WIN
Well, I have a really good start on WIN (see prior post).
I have something that is kinda working and nice.
check it out
Here is the obligatory list of things that need to go into it to make it a real web platform:
EDIT: Or, I can go find stuff that already exists. :/
I have something that is kinda working and nice.
check it out
Here is the obligatory list of things that need to go into it to make it a real web platform:
- Template Engine (Mustache)
- Sessions (done, in memory)
- Secured Session ID generation (done, thanks to php.js)
- Spam/Bot Detection (Skip)
- Rate Throttling (Skip)
- Get User IP (Done)
- GeoLocate the IP (Skip, need to build node.js module for GeoIP)
- SSL Support
- Linking (SSL, Absolute, Relative)
- Auto Path Normalization
- Some form of Persistence Support (Going with Couch DB)
- Asset Management (JS, CSS, Language Files, SWF, Images)
- Language Detection
- Time Zone Support
- i18n language support
- Password Salting (done, thanks to php.js)
- Multi-Error Handling
- URL Routing (done)
- Capcha Support
- Large Scale user file system (part of an Asset Platform/CDN)
- asset linking (maybe an asset platform is needed?)
- static KVP cms (memcache to something else)
- basic validation routines (just collect email, credit card, etc)
- std package system (i.e. easy to clone: login, forgot, reset and friends)
- geocoding
- email (remote to HTTP service)
- datetime (done, thanks to php.js)
- REST clients (vague, got couchdb, what else do I need?)
- CSV parsing (Skip, don't need yet)
EDIT: Or, I can go find stuff that already exists. :/
Labels:
technology
Subscribe to:
Posts (Atom)