Adventures of a Protoss in Seattle: October 2010

Thursday, October 28, 2010

If I were to write another IDE (or, being too productive and unwise)

So, two years ago, I wrote an IDE for a collection of tools that me and some friends used to build an awesome Ajax site. It was madness!

This is what it had:

Text editing (standard stuff like syntax coloring, color based find and replace)
Gui editor (box manipulation and property manipulation, pixel perfect editing)
Locking Source Code via Live Server to edit against
Cross Machine Synchronization (that is, drag a file from your local computer and it gets uploaded to the remote computer and changes are synchronized as the local file is changed = very cool for designers)
A big compiler that did a lot of grunt work (and a lot of automated pre-optimizations, so it automated evil)
100K lines of code.

And, all the IDE was used for was to manage the input to "the compiler". The compiler took all the css, validated it, turned all the parametric css (think Less except Turing complete and with an image library to splice and make buttons really easy to stamp out) into images and css, optimize all the image file names, optimize all the images (using optipng and some custom algorithms that removed superfluous colors that don't add much value), took the state machines and combined it with the JavaScript code, compiled the GUI into JavaScript, brought in the JavaScript kernel, and it would spit out one JavaScript file, build an RPC library for JavaScript to communicate to the PHP backend, one (or two) css files, and all the images (some in sprite sheets) per "application".

It was a revolutionary Ajax platform, but it also sucked for reasons I'm not going to discuss.

If I were to do it all again, here is what I would do differently (and kinda doing now).

Write the GUI editor as a stand alone editor that could be launched from command line.
Not use XML at all for the Gui file format; XML was a terrible choice. Now, I would use JavaScript to construct the object and transfer it to the host program as JSON which it could then trivial serialize out as JavaScript code that merges nicely. Plus, it would be hackable by a text editor.
Open source the custom template engine / (Or use Mustache)
Write the state machine compiler (the thing that made Ajax real easy for us) as a library rather than a DSL; open source it. (The closest thing available now is zef's mobl which I would use now rather than invent my own).
Not worry about image file names, rather just map out all images via CSS and then use a CDN.
Skip GUI editing entirely, it isn't needed and pixel perfect is over-rated.
In locking SCM versus distributed SCM, the key to distributed is designing your code and architecture in a way where merging is obviously the correct choice. If you design your code and how your code works together with other developers in a way where merging will work, then merging will work just fine.
Not build a custom file system/custom source control. I kid you not. I have a tool called fire that lets me very quickly build a document object model that serializes to giant XML files. I then used some old networking code to turn the DOM into a server where developers would checkout bits and pieces of the DOM to edit it. Every change coming from a developer would then backup the entire XML file. Once the XML file grew to 1MB, I was going through gigabytes of data per hour.
Open source the image tools (they are coming)
Open source the JavaScript tools (or use Google's Closure)
Open source the kernel (or use jQuery more effectively)
Open source parametric CSS (Or, add image manipulation to Less)

What I had built was a web operating system, and I plan to dump the code out there some day. If you want to dig through the code some time, then send me a shout out and I'll expedite the dump (even thou it sucks).

I would love to revisit the problems we were facing, but the fundamental problem with building a web operating system is the user/developer education debt is currently infeasible.

More over, I don't have enough energy to maintain that kind of code base without at least five long beards. As I age, I'm looking more to polishing things up rather than hacking things out. Building an IDE was a great way to learn and make mistakes, but it was also misguided as it prevented be from adding value to the project and the business. Although, I enabled my team to do marvelous things that are still not being done in the marketplace of products.

Learn about the project so awesome that I created an IDE for it: @killer startups. on mashable. Our youtube video (holy crap, we built this!).

Watching that, I realized how awesome the tech and idea was. Maybe we should revive it? I don't know.

Wednesday, October 27, 2010

On IDEs (A refinement of my 30 lessons)

In my last post on 30 lessons, the most contentious lesson was on IDEs. I would like to clarify a few things.

IDEs in general, are a good thing. For instance, Visual Studio 2010 is by far the best IDE on the planet IF you use C#. Eclipse is arguably the best (and I'm not going to hold myself to this, so feel free to disagree since I don't use Java anymore) Java IDE out there. For languages like C# and Java, you really do need an IDE; namely because they go beyond just editing and introduce language level extensions like re-factoring and code completion.

Now, that I've given praise to IDE land; I'll tear it down.

root

I deploy and primarily work with Linux stuff. For me, if you want sudo/root access at my company, then you need to be able to use either vim or nano in a highly proficient manner (you also need to be able to use scp). I believe that if you can't do this, then you are not a computing professional worthy of the power of root. Why? there is no IDE for /etc where all configuration lives. Sometimes, something is configured wrong for a new use-case and we have to do something non-trivial in real-time to a live production server. After-all, real men use root.

New Stuff

What IDE existed for Ruby when it came out? or JavaScript? or Haskell? If you find yourself limited by the IDE, then you are going to miss out on new technologies and the ideas they offer. At the end of the day, comfort with "primitive tools" enables you to work with new innovative technologies.

Productivity?

I'll concede every productivity claim made my pro-IDE developers. Here is the thing: what are you being productive at?

Nomadic Lifestyle

A long time ago, my computer crashed and I lost about 2 months of work. I had to reinstall windows and my IDE (Visual Studio 6). I made a lot of modifications to it to suite me. They were lost, and I was lost. I didn't like the feeling that I was dependent on customizing the world to suite me, so I decided then that I would never customize software again. If I needed to tweak anything, then I was doing something wrong. Having lived this way for a while, I can say that Life is pretty good this way. I spend more time on my work than messing around.

Although, recently, I must admit that I've changed the background in my Ubuntu install; that's about it thou.

Cluster Editing

I have a cluster of 32 computers running, and I use pico to code on it. I simply use clusterssh to edit them all at once. What IDE can do this?

fin

Ok, I'm done bashing IDEs.

Saturday, October 23, 2010

30 lessons learned in computing over the last ten years

In looking at the last ten years of my life, I realized that I've learned many things. Mostly about how wrong I've been, and how stupid I've been. So, having looked at the 80+ projects I've worked on in the past ten years (excluding coursework, current start-ups, and graduate studies), I have reduced what I learned to a blog post. (In bullet format no-less).

If you plan to write a programming language, then commit to every aspect. It is one thing to translate between languages; it is an entirely different effort to provide good error/warning messages, good developer tools, and to document an entirely different way of thinking. In writing Kira, I invented a whole new way to think about how to code, and while much of it was neat to me; some of it was very wrong and kinda stupid.
Geometric computing is annoying, always use doubles. Never be clever with floats; floats will always let you down. Actually, never use floats.
Lisp is the ultimate way to think, but don’t expect everyone to agree with you. Actually, most people will look at you as if you are crazy. The few that listen will revere you as a god that has opened their eyes to computing.
If you plan on writing yet another Object Relational Mapper, then only handle row writing/transactions. Anything else will be wrong in the long term.
If you want to provide students with a computer algebra system, then make sure they can input math equations into a computer first.
Don’t build an IDE. Learn to use terminal and some text editor. If you need an IDE, then you are doing something wrong. When you master the terminal, the window environment will be cluttered with terminals and very few “applications”
Learn UNIX, they had 99% of computing right. Your better way is most likely wrong at some level.
Avoid XML, use JSON. Usage of text formats is a boon to expressiveness and the fact that computing has gotten cheap. Only use binary based serialization for games.
If you plan to build an ORM to manage and upgrade your database, then never ever delete columns; please rename them.
Never delete anything, mv it to /tmp
Never wait for money to do anything; there is always a place to start.
Optimize complexity after people use a feature and complain. Once they complain, you have a real complexity problem. I’ve had O(n^3) algorithms in products for years, and it didn’t matter because what they powered were not used.
Text games can be fun too; if you want to write an MMO, then make a MUD. You can get users, and then you can use that to get traction to build something bigger. Develop rules and a culture.
Don’t worry about concurrency in your database until you have real liabilities issues.
Backup every day at the minimum, and test restores every week. If your restore takes more than 5 minutes of your time (as in time using the keyboard), then you did something wrong. If you can’t backup, then you have real issues and enough money to solve them with massive amounts of replicationg.
Never write an IDE; it will always be a mistake. However, if you do make it, then realize most people don’t know that silver bullets don’t exist. You can easily sell it if you find the right sucker; this will of course become a part of your shame that you must own up when you die.
JavaScript is now the required programming language for the web; get used to it. JavaScript is also going to get crazy fast once people figure out how to do need based type inference. Once JavaScript is uber, learn to appreciate the way it works rather than map your way of thinking to it.
Master state machines, and you will master custom controls. Learn enough about finite state machines to be able to draw pictures and reason about how events coming into the machine affect the state.
There is more value in learning to work in and around piles of crappy code than learning to make beautiful code; all code turns into shit given enough time and hands.
If you want to build a spreadsheet program, then figure out how to extend Excel because Excel is god of the spreadsheet market.
Write five games before writing a game engine.
Debugging statistical applications is surprisingly difficult, but you can debug it by using R and checking the results with statistics.
Don’t design the uber algorithm to power a product; instead figure out how to make a simple algorithm and then hire ten people to make the product uber.
Learn to love Source Control. Backups are not enough. As you age, you will appreciate it more.
Communicate to people more often, don’t stay in the cave expecting people will know your genius. At some point in your life, you will need to start selling your genius.
Realizing that every product that exists solves some kind of problem. Rather than dismissing the product, find out more about the problem the product is trying to solve. Life is easier when you can look at new technology and find out that it does solve.
Learn to be sold. Keep the business card of a good salesman. Sometimes, they actually have good products, but they are always useful.
You can make developers do anything you want. Normal users on the other hand are not so masochistic.
If you are debating between Build or Buy, then you should Build. You are debating which means you don’t know enough about it to make a sound decision. When you build, at least you will get something working before you find what to Buy and how to design with it.
You will pay dearly for being prickly; learn to be goo and flexible to the changing world. Be water, my friend.

If you got to this point, then good job. The biggest thing I have learned (and probably the most painful) in the past ten years is how to deal with my ego. Ego is supposedly your best friend, but it also your worst enemy. Ego is a powerful force, but it isn't the right force to use. While I admit that I've used ego to push myself in very positive direction, I think I would have been better off if I didn't as the side effects trump the pros.

Wednesday, October 20, 2010

last ten years (if I uploaded it to github)

I've gone through about 10 years of backups looking, and I thought about putting it all up on github. However, most of it would just suck. So, instead, I'll put up gravestones for each of them and not look back anymore. Then, I'll delete the files and be done with. Keep in mind, most of these are side projects. Only two are failed start-ups (Hurox and FileSharingAccelerator). This list excludes current strategic technologies I've developed at my current business.

kitchen	a game engine
grill	statically typed clone of JavaScript
juknow	a JavaScript to PHP converter
kapowie	an OpenOffice Spreadsheets to PHP converter
blaze	An animation language for doing periodic animations (very similar to these devices that do music looping)
iknife	a vector editor in C#
diffistory	an algorithm that would reconstruct a new version of a file using aspects from n-versions of a file
cosmic pipeline	a re-invented of unix pipes on Windows (decided just to switch to unix for that problem)
simpledepend	a php dependency analysis toolkit; using a standard procedural style, you could produce libraries and end-points that would have no includes and consume minimal memory.
scripture	a documentation language
istate	a network synchronization language; describe data-structure and then use them it would sync over the wire

butcherblock	3d object modeler and game editor
cauldron	3d rendering engine (pre-kitchen)
fork	2d gui editor/texture atlas packer
fryingpan	3d physics wrapper library (physx, bullet, ode)
zmlc/fire	heap modeler for structure serialization
inferno	better version of fire
bnj	Bayesian network tools in java 3.x
gem	graph editor and modeler
rpgengine	a player system with buffs
arcane works	a functional programming language version of excel
melchior	a new IDE for web editing with symbolic editing
alienautopsy	world of warcraft bot
convexbodyplayground	given any closed model, this broke it down into convex bodies
facebookwalk	crawl facebook and convert image emails into text/stalk students
researchfu	cms with mathml support
phpmathml	a mathml to png renderer
tutorcas	computer algebra system
mathgraphics	a point set visualization language
sliceem	a 3d model by using contour levels
gg+	gui graph + (open scene graph gui system)
mathmyway	a php-like language that enabled tutorcas to come alive on the web
monte carlo localization	construct a map and locate yourself (robotics)
jovian katana 1/2	ajax ide, gui editor (used to build hurox)
hurox	social marketplace
frame compiler	given a single image, split it up into 9 images for building frames (generates php code)
image2php	convert an image to php (useful for combining with given colors to change hue on the fly)
javascriptmaster	a javascript parser that enabled me to search code to find inefficiencies and problems.
evolution4k	a 3d space game
progressive mesh demo	an implementation of Hoppe's progressive mesh stuff
spherex	a physics engine where gravity was opposite (everything repelled)
zengine	marching cubes optimization where space is a grid represented by link lists of open and closed space
zterrain	yet another terrain engine
jove3	next generation ajax platform
spherebands	yet another collision detection engine
primgen	marching cube implementation for building models from math equations
lotr-risk	scanned all the game assets and made it work in a network environment
rt-canvas	a real-time (comet based) canvas toolkit for charting streaming data
tilemake	generates 16 images from 3 images using reflections and rotations
nova	javascript based router/topology for web page linking
gravity	a shared file system for keeping replicas in sync (with distributed locking)
mercurial	a simple deployment system (was completely ignorant of the scm)
kira	programming language targeting php with a built in orm
bench	a simple c# benchmarking library/language for testing how webservers react with different types of requests (large posts, small get, small posts) in a haphazard way.
j.encoder	a simple javascript template language (rolled up into jove)
simpledepend	a php comment dependency factor tool; wrap functions in comments and then wrap destinations in comments; the destinations get split into multiple files and the dependents are traced out included when needed.
particalgen	an image generator for particle system images (using point rendering)
scripture	a immature documentation system
diffhistory	a interface to diff that enabled a fs written in dokan to track a head and a revision log of how to go back in time
peoplemachine	a small scale crm
hi, my name is jeff	a book (did my own version of nanowrmo in march)
massshard	a simple orm built around sharding
bspcompiler	compile a mesh into a bsp tree
chef	an ide for my game engine (turned into jove)
factorg	small library for factoring gaussian integers
font2texture	convert a font to a texture map
quickneighborhood	quicksort using planes (designed for broad-phase of a large collection of particles moving quickly)
pointcloudalgorithms	3d algorithms on point clouds
spoon	agent modeling
poonix	turned microsoft virtual pc into a managed cloud
devsync	a suckier version of rsync in windows
glasseson	add glasses using opencv to a picture (client work that didn't pan out)
gold	transaction manager for s3
horde	simplequeueservice based video transcoding (for hurox)
magistrate	interface to kira's backend
filesharingaccelerator	a centralized file sharing service
wealth	my text based version of acquire (board game)
sudoku	yet another sudoku solver
csdc	first orm
mapgen4galciv	made a map generator for galactic civ
mathmod	a web language for math (second version of mathmyway)
auth	oauth before oauth
rage	rapid game engine (precursor to cauldron)
businesslogichelper	regular expression search for enforcing rules on how to produce and consume sql
defensemaped	a map editor for a tower game
notjavascript	a working javascript interpreter with very strict rules (and no newline detection)
sword	a text editor control (from scratch)
boxplane	a gui control
lamegeo	a c# geometry library (primarily for a future reverse geo-coder)
libquest	asynchronous quest manager (how wow quests work)

Monday, October 18, 2010

Otto Released

I released the first version of the highly untested otto today.

Otto is a node.js server that enables CouchDB to replicate to it. That is, changes on the CouchDB machine will replicate to node.js via CouchDB's replication stack. Yay!

Clone it today on GitHub.

Friday, October 15, 2010

NoSQL Design; a primer for future data architects

*This is a rough outline of a book I'm working on; after writing this, I realized that there is a lot of knowledge debt and background understanding.*

Having done RDBMS/SQL based design for the past 10 years, I've been skeptical about how NoSQL works and impacts how we store and query data. For the past two years, I've engrossed myself in using MySQL to emulate and figure out NoSQL design patterns.

Rule 1: Denormalize like a Crazy Person

Normalization in Databases is a side-effect of using Databases. If you buy in to using a Relational Database, then you buy into normalization strategies. In Relational land, normalization allows you to use SQL as a rich query language. For NoSQL, you must denormalize and think in fewer "tables". Think in terms of documents. How can one document store a lot of related data?

Example

A user has multiple phone numbers, and you can represent this with a table consisting of tuples (user_id, label, number). Or, you can augment the user document with a field that stores in array of records. That is,

class Phone { string label; string number; }

class User { guid id; Phone [] phone; }

Serialize your user object into a JSON string and ship it off to a NoSQL solution.

Rule 2: Embrace the "Sea of Shit" (i.e. schema-less design)

Just like how dynamic typing makes my ass itch years ago, so does schema-less design. However, the mode of thought is to think document versioning and namespace partitioning. That is, give every document a field called "class_type" and a field call "class_type_version", and then use them in the obvious ways.

The consumers (developers and yourself) of your data should understand that the schema has multiple versions and have a way to gracefully degrade or be able to initiate a remote upgrade. Alternatively, there could be an upgrade script that does this, but I find that doing it lazily works well if you find the discipline to control and work against versions.

Rule 3: Dominate Complexity with a Dominating Subset Index

Complexity sucks, and the ultimate goal is to get complexity down to O(1) or O(f) where f is linear to the output of the page (and at most sub linear to the entire data set). While half of your view code will be dedicated to viewing single documents, the other half is aggregated/indexed sets of documents. Anything is special case, and handled by replication.

Conquering this requires thinking in Dominating Subsets where you place an index (and store the index as a document) on your documents and you have some efficient way of bring the index (or a subset) to a developer. This is where you do the dreaded join in application logic, but it will be ok as long as the complexity of the join is related to the output of the page. Relax, it will be ok.

Rule 4: Replicate like a Pirate

Disk space is cheap, and memory is getting cheaper. Unless you are google, then a single server can solve just about any problem you have if you can just get the data to it. With node.js or node.ocaml, it is feasible to build the services that drive the business in a customized fashion. Once you get past the single server, the service based design becomes its own challenge. However, it is now isolated from the rest of the ecosystem and can be measured and monitored independently.

Rule 5: Cache, Cache, and then Cache some more. Invalidate!

Fundamentally, you could cache everything forever with no time stamps if you just knew how to recompute the caches based on what you update. This is a fundamentally difficult problem, but it can be easily figured out with dependency graph of how your reads depend on your writes. It sounds simple, but it isn't at the application level.

Friday, October 1, 2010

dev install script

for my own reference

#!/bin/sh
apt-get install mercurial git-core ssh
apt-get install build-essential openssl libssl-dev
apt-get install inotify-tools curl zip unzip
apt-get install apache2 php5 php5-curl php5-tidy php5-memcache memcached php5-cli php5-gd
apt-get install libtool automake
apt-get install ocaml ocaml-native-compilers ocaml-findlib camlp5 libcurl4-openssl-dev
apt-get install libcamlimages-ocaml liblablgl-ocaml-dev libcurl-ocaml ocaml-libs
apt-get install libcamlimages-ocaml-doc omake
apt-get install ruby gem rubygems1.8