Wednesday, January 16, 2013

All Software Development is Schema Management

Before you automatically disagree, stop a bit and think about it.  Can you think of any code you have ever written that did not handle data of some sort?  Of course not.  This is not about relational data either, it is about any data.  There is no software that does not process data.  Even the textbook example of some function that squares a number is processing the parameter and returning a value.  The first line of that function, in most languages, names the input parameter and its type.  That is a schema.

This remains true as you climb out of the toy textbook examples into simple one-off programs into small packages and ultimately to apps that build to megabytes of executables running on a rack (or a floor) of servers.  Just as each method call has parameters, so does each class have its attributes, every table its columns, every XML file its XSD (or so we hope) and that code works if and only if everybody on the team understood what was supposed to happen to the data.

Are We Talking UI or Platform or Middleware?

Generally today we are talking about the server side of things.  This is about any project that is going to take a user request and consult a data store: the file system, a search engine, a database, a NoSQL database, or an XML database.  If you go to the disk, that is what we are talking about.

New Versions Are (Almost Always) About Schema Changes

So imagine you've got some working code.  Maybe a single script file, or perhaps a library or package, or maybe some huge application with hundreds of files.  It is time to change it.  In my experience new code means some kind of change to the schema.

I cannot prove that, nor do I intend to try.  It is a mostly-true not an always-true.

Schema Here Does Not Mean Relational

This is not about relational schemas, though they are included.  If you are using Mongo or some other NoSQL Database which does not manage the schema for you, it just means you are managing the schema yourself somewhere in code.   But since you cannot write code that has no knowledge of the structure of the data it handles, that schema will be there somewhere, and if the code changes the schema generally changes.

Does It Need To Be Said?

Sometimes you will find yourself in a situation where people do not know this.  You will notice something is wrong, they first symptom is that the conversation does not appear to be proceeding to a conclusion.  Even worse, people do not seem to know what conclusion they are even seeking.  They do not know that they are trying to work out the schema, so they wander about the requirements trying to make sense of them.

Order and progress can be restored when somebody ties the efforts down to the discovery and detailing of the schema.  The question is usually, "What data points are we handling and what are we doing to them?"

Tuesday, January 15, 2013

Code Today's Requirements Today

In Part 1 of this series, Do You Know What Day It Is? (written a mere 6 months ago, I really ought to update this blog more often), we looked at Ken's First Law of Architecture:

Today's Constant is Tomorrow's Variable

If you do not know this, there are two mistakes you can make.

Mistake 1: You Ain't Gonna Need It

Mistake 1 is creating a variable, option, switch, parameter, or other control for something which as far as we know for today is a constant.  You can avoid this principle if you remember that You Ain't Gonna Need It.

It is this simple: you can invent an infinite number of variables that the user may wish to control in the future, but the chance of guessing which ones they will actually want to control is near zero.  It is a total complete waste of time.

Let's take an example.  Imagine you are asked to do a task.  Any task.  It does not even have to be programming.  The task will be loaded with points where programmers invent decisions that nobody asked them to make.  For example, consider this blog post.

1) Should I have used a different background color up there in that green box?
2) Should I have used a sub-section heading by this point, maybe I should go back and do that?
3) Should this list be numbered or maybe just bullets instead?

Notice that this list is kind of arbitrary.  I did not think of it very hard, I just made up those questions.  Another person might have made up different questions.  The problem is that the list never ends.

This can never be eradicated, it is fundamental to the human condition.  Well meaning individuals will raise irrelevancies in an effort to be helpful and the rules of polite society work against weeding these out.  Committees will put two heads on a horse and you will end up with a bunch of YAGNI options.  But we can still fight against them.

Mistake 2: Constants in Code - Nobody Does That!

It is possible to go the other way, which is to embed constants into your code so that when you need to change them you find it is expensive, difficult and error prone.  Most of us learn not to do this so early that it seems unthinkable to us.  Why even write a blog post about it in 2013?

Well the very good reason to write about it is that we all still embed "constants in code" in ways that we never think about.

For example, if you are using a relational database, then the structure of your table is a "constant in code", it is something that is woven through every line.  Your code "knows" which tables are in there and what columns they have.  When you change the structure you are changing something very much "constant" in your system.   This is why it is so important to use a lot of library code that itself reads out of data dictionaries, so that you have removed this particular "constant in code."

The great problem here is that we cannot see the future, and you never think that a basic feature of your system is an embedded constant in code.  In a simpler period in my life a system lived on a server.  One server, that was it, who ever heard of a server farm?  That was a constant: number of servers=1, and all of my code assumed it would always be true.  I had to weed out that constant from my code.

Those who understand this basic inability to see the future are most at risk of slipping into over-generalization and generating YAGNI features.  It is an attempt to fight the very universe itself, and you cannot win.

Keep it Simple, Keep it Clean

In the end you can only write clean code, architect clean systems, follow the old knowledge and be the wise one who learns to expect the unexpected.