Monday, August 25, 2008

Advanced Algorithm: Sequencing Dependencies

Some database applications require you to perform a series of actions where you know only that some actions must be performed before others. Before you can perform the actions, you must work out a safe sequence that takes into account all of the dependencies. This week in The Database Programmer we will see an algorithm for doing this.

Examples

There are many examples where a programmer must work out dependencies before doing something.

A manufacturing package may track many steps in the manufacture of an item. Some steps cannot be performed until others are complete. A simple system would require the end-user to work out the entire process, but a better system would let the user enter only the dependencies: which processes require others to be complete. In this kind of system the computer can be used to schedule manufacturing tasks.

All popular Linux distributions have a package installation system in which each package lists its required dependencies. If you want to install a large number of packages in one shot, producing a tangled bunch of related dependencies, today's algorithm can be used to work them all out.

If you are using a data dictionary to build tables, every foreign key represents a dependency, where the child table requires the parent table to exist before it can be built. Today's algorithm can be used to sequence the tables and build them in order.

Another database example is generating code to perform calculations. Some calculations will depend on previous calculations, so your code generator must be able to sequence them all so that the calculations are performed in the proper order.

Big Words: Directed Acyclic Graph

The examples abvoe are all cases of what mathematicians call a Directed Acyclic Graph. If you do not want to read the entire Wikipedia article, the main points are these:

  • We have a set of items. These can be anything you are keeping track of in your database.
  • Any item may be connected to zero or more other items.
  • The connection is one-way only. So if we say A requires B, we are not saying that B also requires A (in fact it is forbidden).
  • There can be no loops (cycles). If A requires B, B may not require A. Further, if A requires B, and B requires C, C may not require A.

Whenever I can, I like to point out that it is very useful to read up on the mathematical foundations of certain programming techniques. We can often pick up very useful insights from those who think of these things at the most abstract level. It is also much easier to get advice from the more abstract-minded database people if you are at least marginally familiar with the mathematical terms.

The Tables

So now let us proceed to the tables and the code. The tables below show a data dictionary that will be used to generate DDL to build a database:

Table: TABLES

TABLE       | DESCRIPTION            | SEQUENCE
------------+------------------------+---------
ORDERS      | Sales Orders Headers   |  ?
ORDER_LINES | Sales order lines      |  ?
CUSTOMERS   | Customers              |  ?
ITEMS       | Items                  |  ?


Table: DEPENDENCIES

CHILD_TABLE  | PARENT_TABLE
-------------+---------------
ORDERS       | CUSTOMERS
ORDER_LINES  | ORDERS
ORDER_LINES  | ITEMS

The problem here is knowing the safe order in which to build the tables. If I try to build ORDER_LINES before I have built ITEMS, then I cannot put a foreign key onto ORDER_LINES, because ITEMS is not there. In short, I need to know the value of the SEQUENCE column in the example above.

The Expected Answer

The example above is simple enough that we can work it out by hand. This is actually a good idea, because we want to get an idea of what the answer will look like:

TABLE       | DESCRIPTION            | SEQUENCE
------------+------------------------+---------
ORDERS      | Sales Orders Headers   |  1
ORDER_LINES | Sales order lines      |  2
CUSTOMERS   | Customers              |  0
ITEMS       | Items                  |  0

This answer should be self-explanatory, except maybe for the fact that both CUSTOMERS and ITEMS have the same value. We need to look at that before we can see the code that produces it. Is it OK that two entries have the same value, and how would our program handle that?

The short answer is that it is perfectly OK and natural for two or more entries to have the same value. All this means is that they can be done in any order relative to each other, so long as they are done before the other entries.

In terms of the example, where we want to build these tables in a database, it means that:

  • We would query the list of tables and sort by SEQUENCE
  • We would loop through and build each table
  • We don't care about ITEMS and CUSTOMERS having the same value, they get built in whatever which-way the server gives us the list.

The same concept applies to the other potential examples: manufacturing, software packages, and generating calculations. So long as you follow the sequence, we don't care about items that have the same value.

Stating the Solution in Plain English

We are now ready to work out a program that will generate the SEQUENCE column. The basic steps the program must perform are:

  1. Initialize the column to -1. A value of -1 means "Not sequenced."
  2. Update the column to zero for all items that have no dependencies.
  3. Repeat the following action until the affected rows are zero: Update the SEQUENCE column to 1 (then 2, then 3) for all rows that have all of their dependencies sequenced already.
  4. Once the command in step 3 is no longer affecting any rows, check for any rows that have -1, these are involved in circular dependencies and we cannot proceed until the user straightens them out.

Stating the Solution in Code

The first step is very easy, we initialize the table with this command:

UPDATE TABLES SET SEQUENCE = -1;

The next step is also very easy, we mark with a '0' all of the tables that have no dependencies. The basic idea is to find all of the entries that have no entries in DEPENDENCIES.

UPDATE TABLES SET SEQUENCE = 0
 WHERE NOT EXISTS (SELECT child FROM DEPENDENCIES
                    WHERE child = TABLES.TABLE)

Now for the hard part. We now have to execute a loop. On each pass of the loop we are looking for all items whose dependencies have all been sequenced. We will do this over and over until the command is not affecting any rows. It is important that we cannot exit the loop by testing if all rows are sequenced, because a circular dependency will prevent this from happening and we will have an infinite loop.

You can control this loop from client code, but I wrote mine as a Postgres stored procedure. This algorithm turns out to be surprisingly complicated. The UPDATE command below may not be all that self-explanatory. What it works out is:

  • Get a list of child tables from the DEPENDENCIES table
  • JOIN through to TABLES to look at the SEQUENCE value of their parents.
  • Group and check that the minimum value is greater than zero, if it is it means all parents are sequenced and the table can be sequenced.
  • Update the SEQUENCE value for the tables we found
CREATE OR REPLACE FUNCTION zdd.Table_Sequencer() RETURNS void AS
$BODY$
DECLARE
    -- Note that rowcount is initialized to be > 0, this makes
    -- the loop work properly
    rowcount integer := 1;
    
    -- This tracks the value we are assigning to SEQUENCE.  We
    -- initialize it to 1 because we already took care of the
    -- the rows that have value 0
    lnSeq integer := 1;
BEGIN
    while rowcount > 0 LOOP
        UPDATE tables set SEQUENCE = lnSeq
          FROM (SELECT t1.CHILD 
                  FROM DEPENDENCIES t1 
                  JOIN TABLES       t2 ON t1.PARENT = t2.TABLE
                 GROUP BY t1.CHILD
                HAVING MIN(t2.SEQUENCE) >= 0
                ) fins
          WHERE TABLES.TABLE = fins.CHILD
            AND TABLES.SEQUENCE = -1;

  lnSeq := lnSeq + 1;
  GET DIAGNOSTICS rowcount = ROW_COUNT;
 END LOOP;
 
 RETURN;
END;
$BODY$
LANGUAGE plpgsql;

The stored procedure above will stop executing once the UPDATE command is no longer having any effect. Once that happens, your final step is to make sure that all rows have a valid SEQUENCE value, which is to say that no entry has SEQUENCE of -1. If any of the rows have that value then you have a circular dependency. You must report those rows to the user, and you can also report the dependencies that are causing the loop.

Conclusion

Sequencing dependencies is a fundamental algorithm that has a lot of use cases in database applications. It is easy enough to accomplish, but the innermost UPDATE command can be a little puzzling when you first look at it. Once you have mastered this algorithm you are on the way to the "big leagues" of database applications such as ERP, MRP and others.

Next Essay: Secure Password Resets

Sunday, August 3, 2008

Javascript As a Foreign Language

So you know 37 different programming languages, you've programmed moon landers, missiles and toasters, and how could Javascript be any problem? Then you start trying to code up some Javascript and find that it just does not feel right, nothing seems to flow naturally or easily. Your instincts do not seem to guide you. You are not alone, here is your cheatsheet...

Welcome to the Database Programmer blog. If you are trying to write database applications in 2008 then you most likely bump into Javascript. My hope in this week's essay is to provide a "soft landing" into this beautiful and powerful but somewhat strange language.

To see the other essays in this series, consult our Complete Table of Contents.

Contents

Today's essay is rather long. It covers extremely basic ideas but proceeds directly to very powerful techniques. I have provided a summary here so that you can skip over the material that may already be familiar to you.

Start Off: Firefox and Firebug

In case you have been living under a rock for the past few years, let me tell you to do your development in a real web browser, that is, Firefox, and to immediately download the Firebug extension. Firebug more or less does everything you need to debug Javascript, and it has many features you may not even know you need. Do not try to develop Javascript without Firebug.

In particular, firebug has a "console" object that you can send messages to, such as this:

console.log("this is so much better than alert()!");
for(var x = 1; x<10; x++) {
    console.log("We are on "+x);
}

Execution

Javascript executes while your page is being loaded, and can be placed anywhere on the page. While I make no claims that the example below is good or bad practice, it does illustrate how Javascript executes.

<html>
<head>
<script>
// Script is executing as it is encountered,
// so this variable comes into existence
// immediately
var x = 5;  

function square(x) {
    return x * x;
}

// Now that the square function is defined,
// we can call it
var y = square(x);
</script>
</head>
<body>
<h1 id='h1'>Here is a Javascript Example!</h2>

<script>
// Script can be embedded directly in the
// body of your HTML (for better or worse!)
var h1 = document.getElementById('h1');
h1.innerHTML = 'I changed the H1 content!';

// This function can be used anywhere downstream
function changeH1(newText) {
    var h1 = document.getElementById('h1');
    h1.innerHTML = newText;
}
</script>
</body>
<div>Here is a div of text</div>
<script>
changeH1("Changing H1 yet again!");
</script>

Variable Scope

Scoping in Javascript is pretty straightforward. If you assign a value to a variable outside of any function it becomes a global. If you explicitly define a variable as "window.x = 5" it becomes a global. If you put the keyword var in front of it before using it it becomes local (and can mask a global variable of the same name). You can use the "var" keyword inside of loops, and many javascript programmers use "var" everywhere. Here is an example.

<html>
<head>
<script>
// We are executing outside of a function, so
// both of these are globals:
var x = 5;
y = 10;

function example() {
    // Since a global named 'x' exists, and we do
    // not use the "var" keyword, we are re-assigning
    // the global variable
    x = 7;
    
    // Using the "var" keyword makes a local variable,
    // we cannot "see" the global x anymore
    var x = 2;
    alert(x);
    
    // I can still access the global variable to
    // set its value back:
    window.x = 5;
    alert(x);
    alert(window.x);    
}

</script>
</head>

Adding Methods to Core Javascript

Javascript lacks certain functions that are very useful to have, such as trimming spaces from strings. One very cool thing about Javascript is that you can directly add these methods to the core language, by adding functions to the "prototype" object of the core classes. Here is how you add a "trim" function to core Javascript.

String.prototype.trim = function() {
 return this.replace(/^\s+|\s+$/g,"");
}
x = "   abc  ";
alert('-' + x + '-'); // the dashes let you see the spaces
alert('-' + x.trim() + '-');  // spaces removed!

When I first saw this trick I dutifully copy-n-pasted it in and it worked, but the syntax looked very perplexing, I could not figure out how to make use of it myself. My brain had not yet wrapped itself around the Javascript mentality. This leads directly to our next concept, that functions are "first class cizitens".

Functions as First Class Citizens

You may have heard that Javascript treats functions as "first class citizens" and wondered, "what does that mean?" The best way to explain it in terms of other languages is that you can create functions on the fly and pass them around like variables. This may be a little hard to grasp, so we will go directly to examples.

// Most languages support this type of function definition
function square(x) {
    return x * x;
}

// Javascript gives you a slightly different syntax if
// you like, which can be extremely powerful
var square = function(x) {
    return x * x;
}

// The books usually go on to an example like this, 
// which frankly did not seem to me to have any purpose:
y = x;
alert( y(5) );

The basic idea to get here is that you can do anything with a function that you can do with a variable. There are multiple uses for this, but we have already seen one, namely, the ability to add a method to a previously created class. This is what we did above when we added the "trim()" method to the base "String" class. This means that our approach to building class hierarchies is very different than in other Object-oriented languages like PHP, Foxpro, Delphi, VB and so forth.

// This example shows two different ways to add methods
// to HTML elements and make them act more object-oriented.

// Method 1, make a function that makes an INPUT read-only
// by changing its style and setting a property.  Notice
// the code refers to "this" as if it were part of an
// object, see below to see why that works.
function makeReadOnly() {
    this.className = 'readOnly';
    this.readOnly = true;
}

// Now attach that function to a DOM element (an HTML INPUT)
var input = document.getElementById('myInput');
input.makeReadOnly = makeReadOnly;

// Some other code can now tell the input to go into
// read only mode:
function changeModes() {
    var input = document.getElementById('myInput);
    // When this executes, the "this" variable in 
    // the function will refer to "input"
    input.makeReadOnly();
}

There is another way to do this as well, that really illustrates how to make use of Javascript's native abilities:

// Method 2 is to defne the function while adding it
// to the INPUT element.
var input = document.getElementById('myInput');
input.makeReadOnly = function() {
    this.className = 'readOnly';
    this.readOnly = true;
}

// This code works exactly as it did above
function changeModes() {
    var input = document.getElementById('myInput);
    input.makeReadOnly();
}

Now that we have introduced this idea, it will come up all over the place in later examples.

Objects And Classes

When I first tried to use Javascript I kept looking for the "class" keyword, but it's not there! Believe it or not you use the "function" keyword to create what we would call a class in other languages. Here is an example of how to create and instantiate an object in Javascript:

// Here is a simple PHP class for an 
// object that handles a row from a database
class dbRow {
    var tableName = '';
    var rowId = 0;
    
    function dbRow(tableName,id) {
        this.tableId = table;
        this.fetchRow(id);
    }
    
    function fetchRow(id) {
        # ...more code here
    }
}

var x = new dbRow('customers',23);

In Javascript we make a function instead of a class:

function dbRow(tableName,id) {
    // When the object is instantiated, this
    // code runs immediately
    this.tableName = tableName;
    
    // We must define a fetchRow function before
    // we can actually call it....
    this.fetchRow = function(id) {
        // some kind of ajax stuff going on here
    }
    
    // ...and now we can invoke the function
    this.fetchRow(id);
}

// When this command returns we have a new "dbRow"
// object.  
var x = new dbRow('customers',23);

Creating An Object Without a Class

We can say Javascript is "purely dynamic", by which we mean you can define anything on the fly, including ojects, even if you have no class definition (er, I mean no "function()" definition...). You can explicitly create an object by enclosing the definition in curly braces. Properties and their values are assigned with "name: value" syntax, separated by commas. Since you can do anything with a function that you can do with a variable, the following is a nifty way to create an object:

var x = {
    propertyName: 'value',
    otherProperty: 'otherValue',
    
    square: function(x) {
        return x * x;
    }
    // Don't put a comma after the last property!
    // It will work in firefox but not in IE!
}

alert(x.square(5));

This syntax is called "JSON" by the way, for "Javascript Object Notation". If you can get comfortable with JSON you can start to code up some really elegant Javascript.

Accessing Object Properties and Methods

You can hardcode references to an object's properties by using the ".property" syntax, but you can also use variables that hold the name of the property.

// Create an object
var x = {
    first: 'Sax',
    last: 'Russel',
    
    combine: function() {
        return this.first + ' ' + this.last;
    }
}

// You can now explicitly access properties
alert (x.first);
alert (x.last);

// But you can also have a variable hold the name
// of the property you want:
var propName = 'first';
alert (x[propName]);

// Objects can be nested to any depth, and you can
// mix hardcoded and variable names.  If we had a
// complex data dictionary stored on the browser,
// we might get the caption for a column like this:
var tableId = 'customers';
var columnId = 'total_sales';
var caption = dd[tableId].columns[columnId].caption;

This works also for functions. Assuming the same object as the above, we can invoke functions that are named by other variables:

var x = { .... repeated from above example };

var methodName = 'combine';
alert( x[methodName]() );

Iteration

As a database programmer I write a lot of code that iterates arrays and associative arrays. Iteration tends to be very important to database programmers, as it is the most natural way to loop through rows retrieved from a database, or to loop through the values in a row. Basic iteration of an array looks like this:

// make an array 
var myList = [ 'sax', 'anne', 'nirgal', 'frank' ];
for(var idx in myList) {
    // console.log() requires firebug
    console.log("Index and value: "+idx+", "+myList[idx])
}

All of the action is in the line "for(var idx in myList)", this structure will loop through the array. On each pass the variable "idx" will contain the array's index number. To actually get the value you need you have to go looking for myList[idx].

Associate Arrays are a very natural data structure for a database programmer, as they are an easy way to represent a single row retrieved from the database. There is no explicit support for associative arrays in Javascript, but this does not matter because you can use an object and get the same results.

// Here is an object
var myObject = {
   first: 'Sax',
   last: 'Russel',
   occupation: 'Physicist'
}
// Now we loop through it like an associative array
for(var key in myObject) {
    console.log("The array key is: " + key);
    console.log("The value is: " + myObject[key]);
}

JSON and Ajax

Nowadays everybody is jumping into AJAX with both feet. AJAX can be particularly useful to a database programmer, because you can make AJAX calls that return objects (including code), arrays, and database data.

I should note that the term "AJAX" itself means something very precise, being "Asynchronous Javascript and XML", while the example I am about to show contains no XML, so my use of the term is not strictly correct. Nevertheless, many people routinely use the term AJAX to mean any round-trip to the browser that fetches some fragment of information without doing an entire page refresh. While this is regrettable, I'm not going to try to buck that trend here.

That being said, here is a nifty way to use PHP to send a data structure back on an AJAX request:

# THE PHP CODE:
function someAjaxHandler() {
    $table = myFrameworkGetPostRetrievalFunction('table');
    $id    = myFrameworkGetPostRetrievalFunction('id');
    $row = myFrameworkRowRetrievalFunction("customers",23);
    
    # This nifty PHP function encodes arrays and objects
    # into JSON, very cool
    echo json_encode($row);
}

This would be handled in the browser like so:

function someAjaxResponseHandler() {
    if (this.readyState != 4) return;
    try {
        eval( 'window.requestData ='+this.responseText);
    }
    catch(e) {
        alert("The server response was not parsable JSON!");
        return;
    }
}

Synchronous AJAX and JSON: S-JSON

It is pretty safe to say that the asynchronous nature of AJAX is a powerful part of its appeal. The request is sent and the browser remains responsive to the user until the request comes back. This is especially powerful for fetching things in the background while the user works.

However, in database applications sometimes it is the Right Thing for the browser to stop while fetching data. If a user clicks on [SAVE] on a CRUD screen to save some changes, we actually want the browser to wait until the server tells them that it all went ok (or not). You can do this by setting a flag on your call. I have found this a very powerful approach to writing desktop-in-browser applications:

function JSON(url) {
    // Create an object
    var browser = navigator.appName;
    if(browser == "Microsoft Internet Explorer"){
        var http = new ActiveXObject("Microsoft.XMLHTTP");
    }
    else {
        var http = new XMLHttpRequest();
    }

    // The trick is to pass "false" as the third parameter,
    // which says to not go asynchronously.

    http.open('POST' , url, false);
    http.send(null);
    
    // Execution now halts, waiting for the complete
    // return value

    // Once execution resumes, we can capture the
    // JSON string sent back by the server and do anything
    // we want with it
    try {
        eval( 'window.requestData ='+http.responseText);
    }
    catch(e) {
        alert("The server response was not parsable JSON!");
        return;
    }

    // Processing of the result occurs here...  
}

jQuery and Friends

Nowadays we have a growing list of very powerful Javascript libraries available. Some of them are very high level and some of them are low-level.

One library I will mention by name as being very useful is jQuery. This library provides a wealth of extremely simple and powerful abilities for finding and manipulating the HTML that is in your document. I highly recommend it.

Closing Thoughts

Any database programmer working in 2008 is either already required to use Javascript or may find himself facing it soon. Javascript is very flexible and powerful, but is different from the languages we are used to for writing applications, like PHP, Foxpro, Delphi, VB and others.

Nevertheless, Javascript can do everything you need it to do, you just have to grasp the "javascript mentality" as it were. I have attempted this week in this essay to put into one place all of the facts and tricks that were not so obvious to me from reading books or simply taking a snippet of code and trying to modify it. I hope that you find it useful!

Next Essay: Sequencing Dependencies