Adventures of a Protoss in Seattle

Monday, October 18, 2010

Otto Released

I released the first version of the highly untested otto today.

Otto is a node.js server that enables CouchDB to replicate to it. That is, changes on the CouchDB machine will replicate to node.js via CouchDB's replication stack. Yay!

Clone it today on GitHub.

Friday, October 15, 2010

NoSQL Design; a primer for future data architects

*This is a rough outline of a book I'm working on; after writing this, I realized that there is a lot of knowledge debt and background understanding.*

Having done RDBMS/SQL based design for the past 10 years, I've been skeptical about how NoSQL works and impacts how we store and query data. For the past two years, I've engrossed myself in using MySQL to emulate and figure out NoSQL design patterns.

Rule 1: Denormalize like a Crazy Person

Normalization in Databases is a side-effect of using Databases. If you buy in to using a Relational Database, then you buy into normalization strategies. In Relational land, normalization allows you to use SQL as a rich query language. For NoSQL, you must denormalize and think in fewer "tables". Think in terms of documents. How can one document store a lot of related data?

Example

A user has multiple phone numbers, and you can represent this with a table consisting of tuples (user_id, label, number). Or, you can augment the user document with a field that stores in array of records. That is,

class Phone { string label; string number; }

class User { guid id; Phone [] phone; }

Serialize your user object into a JSON string and ship it off to a NoSQL solution.

Rule 2: Embrace the "Sea of Shit" (i.e. schema-less design)

Just like how dynamic typing makes my ass itch years ago, so does schema-less design. However, the mode of thought is to think document versioning and namespace partitioning. That is, give every document a field called "class_type" and a field call "class_type_version", and then use them in the obvious ways.

The consumers (developers and yourself) of your data should understand that the schema has multiple versions and have a way to gracefully degrade or be able to initiate a remote upgrade. Alternatively, there could be an upgrade script that does this, but I find that doing it lazily works well if you find the discipline to control and work against versions.

Rule 3: Dominate Complexity with a Dominating Subset Index

Complexity sucks, and the ultimate goal is to get complexity down to O(1) or O(f) where f is linear to the output of the page (and at most sub linear to the entire data set). While half of your view code will be dedicated to viewing single documents, the other half is aggregated/indexed sets of documents. Anything is special case, and handled by replication.

Conquering this requires thinking in Dominating Subsets where you place an index (and store the index as a document) on your documents and you have some efficient way of bring the index (or a subset) to a developer. This is where you do the dreaded join in application logic, but it will be ok as long as the complexity of the join is related to the output of the page. Relax, it will be ok.

Rule 4: Replicate like a Pirate

Disk space is cheap, and memory is getting cheaper. Unless you are google, then a single server can solve just about any problem you have if you can just get the data to it. With node.js or node.ocaml, it is feasible to build the services that drive the business in a customized fashion. Once you get past the single server, the service based design becomes its own challenge. However, it is now isolated from the rest of the ecosystem and can be measured and monitored independently.

Rule 5: Cache, Cache, and then Cache some more. Invalidate!

Fundamentally, you could cache everything forever with no time stamps if you just knew how to recompute the caches based on what you update. This is a fundamentally difficult problem, but it can be easily figured out with dependency graph of how your reads depend on your writes. It sounds simple, but it isn't at the application level.

Friday, October 1, 2010

dev install script

for my own reference

#!/bin/sh
apt-get install mercurial git-core ssh
apt-get install build-essential openssl libssl-dev
apt-get install inotify-tools curl zip unzip
apt-get install apache2 php5 php5-curl php5-tidy php5-memcache memcached php5-cli php5-gd
apt-get install libtool automake
apt-get install ocaml ocaml-native-compilers ocaml-findlib camlp5 libcurl4-openssl-dev
apt-get install libcamlimages-ocaml liblablgl-ocaml-dev libcurl-ocaml ocaml-libs
apt-get install libcamlimages-ocaml-doc omake
apt-get install ruby gem rubygems1.8

Thursday, September 30, 2010

Hack the Planet: How I solved Global Warming in my sleep

I'm a level 1 hippy, and I think the best way to save the planet (because that's the sexy thing to do these days) is to be conservative in what and how I consume. The problem is that requires changing a whole bunch of angry kids (Americans) minds about how to live. Instead, I wonder as I lay in bed wondering things that could be: what if I hacked the planet?

How could I make the world cooler? Well, we can't build a giant air condition because that's just heat exchange and it wouldn't work. Bummer. We need to hack the planet, and either vent excess heat or prevent heat from entering.

Simpsons did it.

So, how do we block out the sun? If we go far enough out into space, then the projected distance of whatever we build would leave a larger shadow. Think of an eclipse where the moon blocks out the sun. If we go out farther, then we don't need to build a moon. (Although honestly, we need to build a moon and control it; how cool would that be).

What if we built a disc out of aluminum foil (you know, to enable it to gracefully degrade to deal with all the junk in space that will hit it) and sent it out into space? Yay! How much would it cost? How big does it need to be? How far does it need to go? What evil plans could I unleash onto the world if I controlled the sun? Could we position it to allow us to effectively negate heat waves in large regions and control temperatures per city? It could pay for itself if an global auction took place to cool cities and save energy!

Well, a few Google searches away, I found this interesting document that I would like to bring to your attention:

Global Warming and Ice Ages
I. Prospects for Physics-Based Modulation of Global Change

And, low an behold, they did the math and it costs less than $1 billion dollars per year. Seems like a heck of a good investment. So, I'll post this on HN and maybe influence the movers and shakers to move and shake.

I on the other hand will go back to bed knowing there is viable solutions to global warming, so tomorrow I can be guilt free in building yet another web framework.

Monday, September 27, 2010

Node.js love (and other cool technologies)

I think node.js, redis, CouchDB are the best web technologies of this year that have matured from neat ideas to things I'm willing to put into production.

To the contributors, movers, and shakers of these awesome technologies, I salute you.

My prediction for 2011 is that node.js (with the help of redis or CouchDB) will over-throw Ruby on Rails. I could justify that statement, but I'm not going too; I just feel it in my bones. My bones tend not to lie about these things.

Thursday, September 23, 2010

php.js

massive library for JavaScript that ports a bunch of common php functions:

http://phpjs.org/

WIN is going to use

addcslashesaddslashes
get_html_translation_table
html_entity_decode
htmlentities
htmlspecialchars
htmlspecialchars_decode
md5
sha1
mt_rand
base64_decode
base64_encode
parse_url
urlencode
urldecode
date
date_default_timezone_set
date_default_timezone_get
strtotime
time
timezone_abbreviations_list
utf8_decode
utf8_encode

Thanks to the php.js team!

Tuesday, September 21, 2010

Technical Debt for WIN

Well, I have a really good start on WIN (see prior post).

I have something that is kinda working and nice.

check it out

Here is the obligatory list of things that need to go into it to make it a real web platform:

Template Engine (Mustache)
Sessions (done, in memory)
Secured Session ID generation (done, thanks to php.js)
Spam/Bot Detection (Skip)
Rate Throttling (Skip)
Get User IP (Done)
GeoLocate the IP (Skip, need to build node.js module for GeoIP)
SSL Support
Linking (SSL, Absolute, Relative)
Auto Path Normalization
Some form of Persistence Support (Going with Couch DB)
Asset Management (JS, CSS, Language Files, SWF, Images)
Language Detection
Time Zone Support
i18n language support
Password Salting (done, thanks to php.js)
Multi-Error Handling
URL Routing (done)
Capcha Support
Large Scale user file system (part of an Asset Platform/CDN)
asset linking (maybe an asset platform is needed?)
static KVP cms (memcache to something else)
basic validation routines (just collect email, credit card, etc)
std package system (i.e. easy to clone: login, forgot, reset and friends)
geocoding
email (remote to HTTP service)
datetime (done, thanks to php.js)
REST clients (vague, got couchdb, what else do I need?)
CSV parsing (Skip, don't need yet)

If I spend 1/2 day to crank out a basic version of each of the above, then in half of a month, I could have the most awesome web platform ever!

EDIT: Or, I can go find stuff that already exists. :/