Adventures of a Protoss in Seattle: November 2010

Friday, November 26, 2010

Programmer Legs (And a potential patch/cure for restless leg syndrome)

When I went to college, there was this room in Nichols that was primarily dedicated to computer science courses. During many (if not all) of the classes, the room would shake a bit. It was kind of annoying, but it was due to the fact that almost every person was shaking their leg.

This is what we called "Programmer Legs". We attributed it to the fact that we didn't get out much, and were mostly pale shadowy figures who were deprived of Vitamin D.

Fast forward 6 years, and I have restless legs. For those that don't know, restless legs is very unpleasant. Imagine you are laying down, and your leg is compelled to move. If you don't move your leg, then you will start to feel impending doom. You will wonder many things like "omg, do I have a blood clot?" or "holy crap, I'm going to die". So, you get up and wander around and you get a drink from the fridge. No problem until you lay down again and the legs get angry.

Well, two days without sleep and I was off to the doctor. They gave me some Naproxen Sodium (pain killer) and Carisoprodol (muscle relaxant). This fixes the problem by putting you to sleep, but it doesn't really help. After you use the prescriptions up, you may have a week until your legs get restless again.

Unfortunately, my muffler broke. This unfortunate event forced me to drive my car to Goodyear to get it fixed, and I walked home.

That night, I slept without problem even thou the prior night I had a re-occurrence of restless leg. The next day, I walked again to pick up my car. Again, that night, no problem. The next day, I basically worked all day and that final evening I had minor issues.

Eventually, I realized a correlation and built a hypothesis. Walking is good you.

If I walk a mile, then I can go to sleep with minor annoyances.
If I walk two miles, then I can go to sleep with no problems.
If I walk five miles, then I get a whole week of no problems.
When I walked ten miles, I got two weeks of no problems.

Now, I try to walk everyday. If I can't go to sleep, then I go for a quick walk (about 1.5 miles) and get to sleep just fine.

While I'm annoyed that I body has decided to jack me and force me to exercise, I'm finding the liberation of being able to walk for hours very... enjoyable. So much so that I'm planning to walk 50+ miles sometime next year over the course of three days.

Big Data Enables Agile Data

The funny thing about NoSQL is that it is being solved and addressed by the Big Data and Scalability communities where there are legitimate problems of scale that are very difficult yet it enables Agile Data.

Here, I define Agile Data as:

the ability to record all available data at the point of a transaction/form/user interaction (including a context)
organize data after data is available

It should be clear from this description that an RDBMS is not Agile in this sense as it requires me to organize data before I collect it. Sure, there is a way to achieve the above with a RDBMS and you could develop a methodology or an engine to accomplish it, but that violates the spirit of an RDBMS since I would just be packing JSON objects into a row.

The perfect example of a Big/Agile data problem is that of analytics. I would like to record as much information is available (the http headers, the client data, maybe some page content, etc). Sure, I could build a structure/schema to try to solve the problem that I think will be valid, but then I'm potentially reducing the amount of information I'm gathering. Instead, I take the mantra of "gather everything", I finish the collection faster and can start studying the data to look for interesting patterns.

The really neat thing about having Big Data at my disposal is that Agile data introduces Big Data problems since storage requirements grow a lot faster than with a typical solution.

Wednesday, November 24, 2010

The Secret of Innovation

There is a phrase that has always bugged me.

"Work Smarter, not Harder"

The reason it bugs me is because every time I say it, I feel like a douche bag. The reason I feel this way because I'm telling people they are doing something dumb and many people associate doing things in a dumb way as an indicator of their intelligence. This is why I don't say this phrase anymore as it isn't helpful. It sounds nice to be able to say because it transitivity says "look, I'm smart, and you could work less if you were as smart as me". That's a douche-baggy thing to say.

Now, having said that. If you can achieve the goal of working smarter with less effort, then you have innovated. This is the secret to innovation.

Before I leave for the day, I think the following thoughts.

How can I do the work I did today in half the time?
What did I learn?
How could I have done better?
How would I explain what I did?

That is, I focus on how I can improve my methodology, my education, my quality, or my communication.

Why you don't need JOINs (and the RDBMS to do them)

Before I sit down and design a back-end for a project, I write the ideal API specification that would enable developers to be happy and enable them to provide all the polish and sizzle to sell the product. Then I turn the spec into a RESTful service where I don't worry about complexity nor scale. I let the developers work with it and I collect data on how it is used, so I know where the crap is; there is always crap to deal with.

This process has worked out very well for me and my clients, and it is working out very well for the development community in general. We are in the SaaS phase where we produce and consume each other's services. This is nice.

Sometimes we consume two such services and then make something new. This is called a "mashup". Well, guess what? A "mashup" is equivalent to an application level JOIN. This used to be a service provided by the relational database, but now people are doing it by hand.

Better yet, people are used to it and they are kinda fine with it. This is a good thing as it enables service developers to focus on their services and let product developers focus on their product. Developers are learning to

Optimize, cache, pre-compute back-end requests
Write loops to efficiently cross reference code
Avoid the angry looking DBA

There are systems, like memcached, that exist in such a way that they enable product developers to solve product problems without the need to make the DBA change anything. Once developers are empowered, they can use their own creativity and intelligence to polish the product.

We may be looking at the last decade that DBAs are ever bottlenecks. Does this mean that DBAs are obsolete? No, it is a lateral move for them as they become service programmers with the role to produce optimized services. Can they use a RDBMS for it? Sure, they can eat what-ever dog food they want.

Having been on both sides of the fence as a DBA and a product developer, I am comfortable saying that NoSQL movement is definitely going to take hold in ways that people don't expect. I think it is going to restructure the entire way corporations view IT.

Sunday, November 21, 2010

Avoid DRY for Product Development

This is part of my comment on HN.

At the right level of abstraction, DRY is the best advice possible. If you are building a data layer or a system, then you will be best off if you keep things DRY. However, once you get into product land. Then, I advise people to just get the job done quickly rather than worry about engineering principles.

The reason is very clear, in product land, engineering principles are third to usability and marketing. Avoiding DRY enables two things.

Polish
If you have five sections of code that are the same now, then there is good chance that they will diverge as the product matures. This is polish, and it is a good thing. While it is true that you have more work to do, it isn't rocket science. Trying to use DRY for polish is going to create even more cumbersome code with a lot of branches for all the special cases, and it tries to create an artificial problem of intelligence. Please avoid.

Hiring Scrubs
If you realize that the polish needed to make a product isn't exactly rocket science, then you can hire many scrubs. I define a scrub as someone new to computing, but capable enough to work within a developer environment to find and polish simple code. I like to hire scrubs as it provides a great first job in programming for many people. I was a scrub once, and it wasn't that bad as a 16 year-old.

Once you enable these two things, you have enabled marketing and the usability folks to iterate.

Update: Don't just hire scrubs
It takes a balancing act to get products out using a combination of awesome engineers, "good enough" engineers, and scrubs. A company ultimately needs all tiers to be able to push and iterate products quickly, and it is the company's job to ensure that the people process can nurture engineers from scrub to awesomeness.

Wednesday, November 17, 2010

Escaping Mr. 20% and losing the ego

This is a fairly long response to a comment on HN.

I've written many lines of code and thought many ideas in my life, but I have not shipped nearly as many. Why? I love coding for coding's sake more than shipping. Shipping requires product development, non-longbeard design, marketing, sales, and more important: customers.

Except for the game engine, everything I did was the special 20%. All the 20% projects were just a neat little idea implemented as if the core tech behind a masters thesis or a senior thesis. However, they didn't have the polish to become a product, and I get easily distracted by more difficult and more sexy ideas. At best, most of them make me look like Captain Hindsight.

I'm perfect for academics, but I found myself being too impractical for my students. This was one of reasons I dropped out of graduate school as I had become an irrelevant preacher. I did not practice what they were going to do, so how could I teach them? It felt dirty teaching them fairly useless things. I was working with tools (such as OCaml and LISP) that enabled me to program better, but programming better doesn't mean shipping better. Most programmers just need to learn how to solve basic problems and have the discipline needed to ship products; telling them that they can avoid memory leaks by using OCaml is like telling them to learn french to avoid pissing off Mexican drug lords. Instead, they just need to get in the discipline of being careful with alloc and free. Any aspect of a technical decision could be looked at as either a technological or a management+discipline issue.

So, looking back on the pile of code I've written. They are all shit because they didn't ship. Shitty code that ships is better than perfect code that doesn't. This is one of the lessons in life that make people like me depressed, but I've gotten over that. I've helped shipped three products and working on two more. One of them is my November start-up sprint (which I'm doing OK on, but I could be doing better).

Problematically, building many 20% things builds an ego. Ego is fascinating. On one hand, you need Ego (or to utilize a customer's ego) to sell a product. On the other hand, you need to lose your ego to build, ship, and support a product. This is why you need to have someone else to do your sales for you. When you get some one to sell you, then you don't need ego anymore. You just have a bunch of work to do.

Monday, November 15, 2010

How to use CouchDB? like this

CouchDB is a very interesting persistence package, and it solves 90% of the problems you find when you build a back-end for a web application. The 90% that CouchDB includes get/put/b-tree indexing/reliability; all this is good standard-stuff in the database world. I want to talk about the other 10% since crud is boring.

The last 10% is usually something like search; it is the novel algorithm that takes all your data and provides it in a meaningful way that makes your product awesome. CouchDB rarely solves this (neither do other packages). The more special the algorithm is, the more painful it will be to try to solve with CouchDB's MapReduce framework alone.

Fortunately, CouchDB has replication built in. I use the replication to push data from CouchDB to a custom server where I aggregate it into a meaningful service. The library is called Otto (short for ottoman).

The biggest problem you are going to have is what happens when your custom server crashes?

This can be solved by

providing your own persistence, and deal with reliability
not worrying about it and launch 3 servers with a custom HTTP server that replicates three ways; spend more money.
don't care and re-replicate the entire data set, and have potentially non-trivial down-time.

All three of these options suck at some level, but 2 is where you will want to go. In the beginning thou, 3 is the best choice.

From a complexity standpoint, you can make your life easier by enabling your custom software to merge in bulk sets. This enables you to lazily run your algorithm as you collect a lot of data at once. I have found that this style of bulk insertions makes the third option feasible in many domains.

The nice thing about using CouchDB is that you don't need a schema. You don't need to "plan". If you need store data, then you just store it. Just give it a namespace and insert.

Please comment if you think I should write a book on CouchDB? I've done relational databases for years, and I've lurked in the CouchDB for a while. I'm currently building a web framework around node.js and CouchDB called WIN.

Cloud Coding

I just want to say that Twilio is an awesome company. About a year ago, I was seriously considering dealing with asterisk. Asterisk is perfectly good software, but it didn't make sense for the problem I had. I took a risk and quickly developed the VOIP feature for a product.

Developing and supporting a product with Twilio over the past year has made me think about the need for developing in the cloud. I'm at the base level of cloud development: terminal. I can log into a VM and get things done. I don't need scp, sshfs, or anything else to support my development capability. My brain, fingers, and ssh are all that I need.

Why do I need a VM to develop with twilio? Well, the monitor and way to debug a twilio application is to use the phone. The phone isn't easy to work with, so I need a public IP. I also need a way to watch the HTTP traffic, so I need the ability to grep logs.

Basically, fundamental Unix skills let you work with new technologies faster.

Saturday, November 13, 2010

A programming language is not just a tool

A programming language is tool to let you transform representations and solve problems.

A programming language is a contract that enables groups of people to solve problems together.

A programming language is a contract that enables frameworks to be built

A programming language is a way to expression thoughts

Wednesday, November 10, 2010

Now, having said that

Having given praise to my polyglot nature, I must condemn it.

One of the reasons I study programming languages is so that I can write my own. When I find a neat feature in a programming language, I think about how I would solve the computational problem to translate it into C.

Having aged, I'ved decided against writing yet another language. The world needs less languages than more. We ultimately only need 3.

We need one language to experiment and think about complexity.

We need one language to ship products.

We need one language to train beginners.

Will we ever agree which language fits these roles best? No. Now, I do my best not to amplify the problem.

Programming Language research will be over when all three questions can be answered with one language. In my opinion, the closest languages now to do it are either C# or JavaScript.

I'm betting on JavaScript.

The 3 Programming Languages you need to Know

Every good programmer needs to know at least 3 languages. Of course, I'm probably wrong.

I can quickly understand a programmer using the biases and stereotypes that I've built up over the years by knowing their favorite programming languages. When I read a resume, I try to classify the "why the programmer used the programming language" with these arch types and how I stereotype and use my biases to find what I want from a stack of resumes.

Happiness Language
This language is what you think in. This is the language that you wish you could use all the time. This is the language that you write your projects in. For me, this is OCaml (and now JavaScript although I'm integrating CoffeeScript into my universe). For many, this is LISP or Haskell. When I find out someone's happiness language, it tells me a lot about them.

If the language is esoteric or new, then they are passionate about computing.

If the language is mainstream, then they may be more sensible or practical about computing.

Hack-it-out / GTD Language
This is the language that contains everything including a kitchen sink. It is very mature and has a massive library base. With this language, you enable yourself to build quick services and command line utilities to help you out in a pinch. Anything that has already been done is at your finger tips.

If the programmer lists many languages, then they may be able to utilize all of them by building RESTful services.

If I don't detect a hack-it-out language or too few languages, then I suspect they are either inexperienced or too specialized.

Bread and Butter
This is the language that you can use to keep yourself alive when life hands you lemons. This is the language that you know just in case you need to hustle yourself to provide for yourself and your family.

If they don't have a bread and butter language, then they probably need some education on how to work in a team effectively.

Friday, November 5, 2010

Betting the Farm on JavaScript

So, I can't sleep and what am I thinking about? Type Systems...

Two years ago, I would have cursed the dynamic typing landscape as fools and miscreants because I'm a performance junkie. I like my code to run insanely fast. This is my typical salvo in any type system argument.

Now, I realize that I was just being closed minded and rather stupid (and even more so ignorant). Consider this JavaScript code.

function (a,b) { return a + b; }

What is its type? Well, if I may bastardize OCaml type notation for a second, then I would say something like:

(a',b') -> c'

What the fuck is c' ? The reason I don't know what c' is because JavaScript's + operator is overloaded in a poorly defined way that is plain stupid from a type inference perspective. This question combined with my lack of creativity ultimately lead me to the conclusion that JavaScript is doomed to win the performance argument.

Well, JavaScript is doomed until you run the program for significant lengths of time and realize that we have multiple cores that are going to be wasted on consumer machines. Let me explain this black magic that is bouncing around in my head.

Type Inference in JavaScript

Given a function like the above, I can't type it in a meaningful way unless I know the type of the inputs. However, when the function is evaluated I have the types and the return type can be established. If I utilize just in time compiling, then I can use type inference to produce types and compile a native version of the code given the correct type signature.

Type inference is fairly expensive (at least, the ways that I've solved it) but is fortunately 100% cpu bound. It seems perfect to fork off into a queue for some poor core to solve.

My suspicion is that this is what JaegerMonkey is doing, but I'm rather ignorant of its code. At least if they are not doing it, then I can add it in as a patch If I suddenly find myself with a wealth of time on my hands.

Future of JavaScript

I wager that JavaScript will enable C level optimizations by learning how to game the next generation of JavaScript environments. I look forward to optimizing my JavaScript libraries to have tight inner loops.

I believe in this future because there is commercial interest in enabling it.

This is why I'm very optimistic about node.js and the future of server side JavaScript. Sure my node.ocaml is three times faster than node.js now, but it is just a matter of time until node.js catches up.

It would also be a good time to take v8 or a build of JaegerMonkey and start building a game engine with it and provide a canvas/WebGL compliant C back-end.

Also, if I was back in academics, then this would be the perfect topic for a PhD dissertation.

Tuesday, November 2, 2010

API as your Queen (or how I feel in love with NoSQL)

I like to think and play around with software architectures. About 1.5 years ago, I was inspired by twitter. Or, more precisely, I was inspired by the twitter developer community and the developing mash up scene. I have also watched the growing number of NoSQL solutions. The two are connected.

Developers are performing joins in the application space (or they are not doing joins at all).

Joins are the most important aspect of the relational database model, and if we don’t need them at the data tier any more, then do we need a relational database?

The answer is "it depends", but you don't even want to start there. You want to start by building an API that enables the product to work. This is where I start as I can get a good flow of the ebb and flow of data.

With node.js, node.ocaml, or KayakHTTP, it is very easy to create a REST-ish server for enabling developers to build products against a back-end. You can also see how much of a back-end you really are going to need and which products you will want to utilize in the future.

Always start by building your product as an API.

When you do this, your API is now a NoSQL back-end. You would be surprised what people can do when they don't have to deal with SQL.

Green Computing; do more with less

It seems these days that anyone can do "real-time" internet these days due to awesome technologies like node.js or node.ocaml (shameless self promotion). Realistically thou, no one cares if your software is optimized in C using pointer arithmetic to squeeze an extra hundred requests per second out of a box. Very few care that one has taken the time to read the Black Book. Why? computers are really fast and many people have figured out how to make software scale out horizontally. Computing resources are way cheaper than developer mind-share, so where is the motivation for business?

Well, that's a good question.

This can be addressed with "green computing" marketing campaign. Unfortunately, It would be ultimately doomed out of the gate.