Tuesday, 24 April 2012

The 3 Basic Tools of Systems Engineering



The 3 Basic Tools of Systems Engineering
by Ted Dziuba on Tuesday, December 07, 2010

One of the most important things I learned when programming for a startup is how to design reliable systems. A startup programmer needs to understand the business economics of systems design: that the goal is to create the desired functionality, not to write code. Code is only incidental, and it should be the last tool you use to solve a problem.

There are three basic tools you can use to solve a technical problem: money, time, and code. This seems obvious, but the critical point is that you must try them in that order. Out-of-order execution of these tools leads to Very Bad Things, which we will discuss later.

Money

Money is by far the best way to solve a problem because it saves time and helps you avoid writing code. You can usually use money to solve performance and scalability problems, either by buying more hardware or faster hardware. My favorite example is how solid state disk drives make disk I/O problems go away because there is no penalty for disk seek.

At Milo, we did this when we had a database performance problem: read queries were running slowly, so we spent the money to buy a really powerful server for our database: 24 cores, 64 gigabytes of RAM, and solid state disk drives. This solved the problem for the life of the company until we were acquired. It was absolutely worth the money because we then had more time to spend building the product, and no liabilities that would come from re-architecting the data model.

It is rare that money can completely solve the problem (or that you have enough money), but it is an easy tool to try first.

Time

If money doesn't work, invest time to research existing pieces of functionality that do. As I have said before, basic Unix literacy can help you know what tools are available to solve a given problem. For systems design, it helps to know what larger services are available for different classes of problems. To name a few:

Load balancing/redundancy: HAProxy
Caching: Squid, Varnish (not Memcache because it forces you to write too much code)
Database: PostgreSQL or Oracle if you can afford it.
Database replication: Slony-I
Full-text search: PostgreSQL, Solr (warning: if you use Solr the way I think you will, you will have multiple points of truth in your system)
Queueing: if you're using a queue, again, you may end up with problem somewhere.
Logging: syslog, and nothing else. Ever.
It seems obvious, but sometimes it needs to be stated: use other peoples' work to accomplish your goals. Well-known open source packages are very high quality, and are far more reliable than anything you could build yourself. Even a 90% solution off the shelf is worth it because of the time you save in maintenance.

Code

Writing code is the last resort for solving a problem. Code is a versatile enough tool that you can make it solve just about any problem, but every line is a liability. It's design, future maintenance, monitoring, testing and profiling. Write code only when you have proven categorically that money and third party software don't work.

As a side note, when you are testing the code, I have found that unit testing is a losing investment. Acceptance tests, however, are the most cost effective way to manage the risk that new code introduces, in terms of time spent developing.

Using the Tools Out of Order

The worst thing you can do is to try the code tool first, without considering money or time.

When the first thing you do is dive into code, you are dooming yourself to either designing an unmaintainable system, or to reinvent existing tools poorly. This may be acceptable in an academic or research setting, but in a startup, it's downright foolish. You may be able to deploy your system faster if you code it all yourself, but it will be a monkey on your back for its entire lifetime. PostgreSQL has never woken me up in the middle of the night with a segmentation fault or NullPointerException, but databases I've written myself have.

Functionality is an asset, but code is a liability. I will say this until you like it.


No comments:

Post a Comment