Getting started with YDN-DB javascript module

Setup

YDN-DB is a pure javascript library which use HTML5 browser database as storage mechanisms, which include IndexedDB, WebDatabase (WebSQL) and WebStorage (localStorage). Most moden browsers including IE10, Chrome, Firefox and Safari support either IndexedDB or WebSQL. The library also work on enviroment supporting these database such as phonegap, WebView and UIWebView mobile clients.

Download the latest version of distribution. For development you should want to use ydn.db-x.x-dev.js, which enforce assertion, log warning and error.

The namespace of this library is ydn.db. The script provide main javascript class object call ydn.db.Storage.

Pages in this sections include the YDN-DB script and some preloaded data and utility functions, so that you follow the sample code in your browser's developer console to see in action.

Database connection

A simple way to initialize a database is by specifying a database name.

db = new ydn.db.Storage('db-name');
db.put('store-name', {message: 'Hello world!'}, 'id1');
db.get('store-name', 'id1').always(function(record) {
  console.log(record);
});

The storage instance, ydn.db.Storage, connect to suitable data storage mechanisms starting from IndexedDB to WebSQL to localStore. It will open existing database or create a new database with the given name.

All database operation methods are asynchronous and result are immediately return as a promise object. It is also called future or deferred object. The resulting deferred object accepts two asynchronous functions: done to receive value on success event and fail to received in case of error. A convenient then  accept both of them. In case of error, this library always invokes error callback with Error object with has name attribute and message attribute. 

If you use raw source code, deferred object is goog.async.Deferred class, which is also similar to JQuery Deferred  or WinJS.Promise. Both of them provides advanced asynchronous workflow.    

Storing data

Use put method to insert a new or update existing record(s).

Out-of-line key

db.put('store1', {test: 'Hello World!'}, 123)

The first argument is store name. It is object store name in IndexedDB and TABLE name in WebSQL. A table or object store will be created if not exist.

The second argument is the record we want to store. It should be a simple object. A structured clone of the object is stored in the database. A structured clone is similar concept in JSON serialization. But it is more efficient and more powerful. File and Blob are clonable, while  DOM Element is not. If the record fail to clone it, underlying database API shall throw DataCloneError. None of the error from the database API are caught by the library and hence not documented. In some instance, dev version of the library may caught the error and rethrow for providing better error message.   

The third argument is primary key of the record. A key can be number, string, Date or simple array. Since we are given primary key separately from the record, it is called out-of-line key. 

Use get method to retrieve it by the primary key.

req = db.get('store1', 123);
req.done(function(record) {
  console.log(record);
});
req.fail(function(e) {
  throw e;
});

Deferred object is not only used for asynchronous workflow, but also for error control. You should not be surprised that all of database operation methods expect some errors. And it should be handled, at least, as above by re-throwing the error object. Handling by single global error function is discouraged. If error is an exception, it will have full stack trace. However database operation error are coded error and not exception error.   

In-line key

Primary key can be embedded into the record. Embedding key is better because it enforces consistency. We specify how to extract the primary key from the record by using keyPath.

record = {id: 'abc', message: 'Testing in line key'};
req = db.put({name: 'store2', keyPath: 'id'}, record);
req.done(function(key) {
  console.log(key);
});
req.fail(function(e) {
  throw e;
});

Here, we are giving store schema in the first argument instead of just store name. The two store schema attribute are 'name' and 'keyPath'. A record is inserted to 'store2' store by using primary key 'abc'. Since primary key is specified in the record object, it is called, in-line key.

Set autoIncrement of the store schema attribute to true to let the database engine to generate a increasing primary key when it is not provided. Auto key generation works for both in-line and out-of-line key.

Retrieving

Keys are the most efficient way to retrieve a record. If we don't know the key, we must browse the whole store to find it. Let us add some more records and list them.

db.put('store2', [{id: 'a', message: 'a record'}, {id: 'b', message: 'b record'}]);

Notice multiple records are stored by using array of records in one transaction.

Listing use values database operation method.

db.values('store2', null, 10).done(function(records) {
  console.log(records);
});

The results is sort by ascending order of primary key. The second parameter is a limit to the number of results. It is a good practice to specify a limit.

Listing is the most expensive database operations. Unused records retrieved from the database are termed as false-positive retrieval, of which we must minimized. Instead you will list only keys as follow.

db.keys('store2', null, 10).done(function(records) {
  console.log(records);
});

In contrast to listing records, key listing is much faster because it obviates serialization and also (in most case) IO operation. IO operation takes 2 to 4 order of magnitudes slower than memory operation. One of the main purpose of query is reducing false-positive retrieval of record value value. Key is very important for effective querying. And hence keys are carefully constructed. In addition to primary key, there is secondary key, which is simply called as index key.

Schema

While running the above codes, we modified database schema on creating new stores. It is not good in production usage, because modifying database schema is not a trivial process. It need to notified all connections on other tabs including worker thread as well. Additionally we should use a fixed schema through out a web application for consistency. Schema, that you just created, is retrieved as follow.

db.getSchema()

You will find database version is undefined, since we are not giving a version. The database is said to be in auto-version mode. The actual schema from the database connection is retrieved as follow.

db.getSchema(function(schema) {
  console.log(schema);
});

These two schema are similar.

A database schema is basically defining object stores or TABLE in WebSQL. An example is

var author_store_schema = {
  name: 'author',
  keyPath: 'email', // optional, 
  autoIncrement: false, // optional. 
  indexes: [
    {
      name: 'born', // optional
      keyPath: 'born',
      unique: false, // optional, default to false
      multiEntry: false // optional, default to false
    }, {
      keyPath: 'company'
    }, {
      keyPath: 'hobby',
      multiEntry: true
    }
  ] // optional, list of index schema as array.
};
schema = {
  stores: [author_store_schema]
};

The above schema define one object store. The name of object store is 'author'. Since keyPath is defined, it is using in-line key. Since autoIncrement is false, all records must have a valid key in its 'emial' field attribute.

The object store 'author' has three indexes, named 'born', 'company' and 'hobby'. In WebSQL, it is a column name. If name is not defined, it is default to keyPath. An example 'author' record will be:

author_1 = {
  email: 'me@aaronsw.com',
  born: 531763200000,
  first: 'Aaron',
  last: 'Swartz',
  company: 'Reddit',
  hobby: ['programming', 'blogging', 'politics']
};

Defining a data type is recommended if it is intended to used in WebSQL. Being schema-less database system, a record could have arbitrary fields in addition to index fields. In WebSQL, these extra fields are stringified and stored in special default column of BLOB data type. BLOB data type is not queryable.

If index schema attribute, unique is true, the underlying database engine will check for every new record write to make sure that index key value is unique. If not unique, it will throw ConstriantError

The index schema attribute, multiEntry is meaningful only for field value of array data type. The index 'hobby' has multiEntry of true, so that each element in of the array hobby are indexed individually.

In addition to 'stores' attribute, database schema take 'version' attribute. If version number is specified, the library will open with the given version. If the client browser do not have or lower than the given version, it will be upgraded as necessary. Client version must not be higher than given version. If client version is the same as given version, the database schema must be similar. If not similar, the library will refuse to connect the database. This library will not work, if schema is not known.

Let us generate some data for querying.

genAuthors = function(n) {
  var out = [];
  for (var i = 0; i < n; i++) {
    out[i] = {
      first: randName(),
      last: randName(),
      born: +(new Date(1900+Math.random()*70, 12*Math.random(), 30*Math.random())),
      email: randEmail(),
      company: pickOne(companyList),
      hobby: pickMany(hobbyList)
    };
  }
  return out;
};
db = new ydn.db.Storage('test-2', schema);
var authors = genAuthors(10000);
db.put('author', authors).then(
  function(ids) {
    console.log(ids.length + ' authors put.');
  }, function(e) {
    throw e;
  }
);

put method uses multiple asynchronous threads and generally very fast.

Basic query

If you don't know the exact key, you must enumerate the whole or a range of object store records. Here the process is called querying. Instead of enumerating records, you can only enumerate only key. This is called key only query. Key only query is very efficient because the most expensive operations of IO and de-serialization are avoid.

Counting

Generally NoSQL database does not provide counting, but IndexedDB API does.

db.count('author').done(function(x) {
  console.log('Number of authors: ' + x);
});

It is not fast. It is recommended that the result be cached. However, interestingly, Firefox uses SQLite backend and counting is fast.

This is the only aggregrate function provided by the IndexedDB API.

Sorting

Keys are sorted in the database. Hence sorting is required, the field is indexed. The following illustrate iterating records sorted by 'born' date field.

var reverse = false; // sorted by ascending of date
var limit = 10;
  db.values(new ydn.db.Cursors('author', 'born', null, reverse), limit).done(function(records) {
  console.log(records);
});

This query result is sorted by 'email', the primary key.

var key_range = null;
db.keys('author', key_range, 100).done(function(records) {
  console.log(records);
});

Instead of values, here keys methods is used, so that only emails, primary keys, are listed.

Similarly, we use keys method to index key iterator when we only want sorted secondary key. The following example list hobby index. We set unique attribute to true, so that we get unique list of hobby.

var unique = true;
var hobby_key_iter = new ydn.db.Cursors('author', 'hobby', null, false, unique);
db.keys(hobby_key_iter).done(function(hobby) {
  console.log(hobby);
});

The above code snippet use key iterator, you can also use value iterator.

var hobby_iter = new ydn.db.IndexValueCursors('author', 'hobby', null, false, unique);
db.keys(hobby_iter).done(function(hobby) {
  console.log(hobby);
});

Both iterators return effective keys, 'hobby'. The return values will be different if keys method's complementary function, values, is used.

db.values(hobby_key_iter).done(function(refs) {
  console.log(refs);
});
db.values(hobby_iter).done(function(refs) {
  console.log(refs);
});

values method returns reference value of the iterator. The reference value of index key iterator is primary key and that of index value iterator is record value.

Filtering

The primary way of filtering is query by key range. More sophisticated filtering are iterated merging of key range results. We dedicate these sophisticated filtering on later sections.

The following query finds authors born in 1942 February.

var lower = + new Date(1942, 1, 1); // 1942 February 1
var upper = + new Date(1942, 2, 1); // 1942 March 1
var iter = ydn.db.IndexValueCursors.where('author', 'born', '>=', lower, '<', upper);
db.values(iter).done(function(records) {
  console.log(records);
  records.map(function(x) {
    console.log(x.first + ' ' + x.last + ' ' + new Date(x.born));
  });
});

A key range is a continuous segment of keys defined by interval by using lower and upper bound.

To retrieve all keys within a certain interval, you can construct a key range as follow:

Range Code
All keys ? x ydn.db.KeyRange.upperBound(x)
All keys < x ydn.db.KeyRange.upperBound(x, true)
All keys ? y ydn.db.KeyRange.lowerBound(y)
All keys > y ydn.db.KeyRange.lowerBound(y, true)
All keys ? x && ? y ydn.db.KeyRange.bound(x, y)
All keys > x &&< y ydn.db.KeyRange.bound(x, y, true, true)
All keys > x && ? y ydn.db.KeyRange.bound(x, y, true, false)
All keys ? x &&< y ydn.db.KeyRange.bound(x, y, false, true)
The key = z ydn.db.KeyRange.only(z)
All (string or array) keys start with a ydn.db.KeyRange.starts(a)

Paging

keys and values methods accept limit and offset for paging. Iterator has position state property. The position of an iterator is the last cursor position defined by effective key and primary key.

The following query list all authors having hobby in 'camping' using iterator and paging.

var iter = new ydn.db.IndexValueCursors('author', 'hobby', ydn.db.KeyRange.only('camping'));
var limit = 20;
db.values(iter, limit).then(function(records) {
  console.log(records);
  console.log('From ' + records[0].email + ' to ' + records[records.length - 1].email);
}, function(e) {
  throw e;
});

Also notice that the result is sorted by primary key. To continue next page, call list method again.

db.values(iter, limit).then(function(records) {
  console.log(records);
  console.log('From ' + records[0].email + ' to ' + records[records.length - 1].email);
}, function(e) {
  throw e;
});

Notice that offset value is not used.

Map

The library provides map database iteration methods. It can be used to emulate projection query.

var iter = new ydn.db.ValueCursors('author', ydn.db.KeyRange.starts('a'));
var names = [];
db.map(iter, function(value) {
  names.push(value.first + ' ' + value.last + ' <' + value.email + '>');
}).then(function() {
  console.log(names); 
}, function(e) {
  throw e;
});

Reduce

For aggregration query can be emulate by using reduce database iteration method.

The follow snippet calculate average age of authors.

var iter = new ydn.db.Cursors('author', 'born');
var initial = 0;
var now = +new Date();
db.reduce(iter, function(previous, current, index) {
  var age_ms = now - current;  
  var age = age_ms / (1000 * 60 * 60 * 24 * 365);
  return (age + previous * index) / (index + 1);
}, initial).then(function(result) {
  console.log('Average age: ' + result); 
}, function(e) {
  throw e;
});

Updating

Use open method to update records.

var iter = new ydn.db.ValueCursors('author', ydn.db.KeyRange.starts('a'));
var mode = 'readwrite';
var updated = 0;
var deleted = 0;
db.open(iter, function(cursor) {
  var author = cursor.value();
  if (author.company == 'Oracle') {
    cursor.clear().then(function(e) {
      deleted++;
    }, function(e) {
      throw e;
    });
  } else if (author.category != 'A') {
    author.category = 'A';
    cursor.update(author).then(function(e) {
      updated++;
    }, function(e) {
      throw e;
    }); 
  }
}, mode).then(function() {
  console.log(updated + ' records updated, ' + deleted + ' deleted.'); 
}, function(e) {
  throw e;
});

Debugging

Use dev version of the compile JS file during development. Put .map file in the same directory as js file. Turn on source map for debugging with full source code. Use build-in logging facility to detail logging.

var module = 'ydn.db';
var level = 'finer'; // warning, info, fine, finest, all
ydn.debug.log(module, level);

Summary

Database operation methods
Read Description
get Get a record
keys Get a list ordered keys, or effective keys of an iterator
values Get a list ordered records, or reference values of an iterator
Write Description
add Add record(s)
clear Delete record(s)
put Add or update record(s)
Iterate Description
map Map reference values of an iterator
open Open an iterator for read/write operation
reduce Reduce reference values of an iterator
scan Scan keys of an iterator
Aggreate Description
count Count records

Exercises

  1. Write a script that will throw DataCloneError.

    There are two primary way that cause DataCloneError.

    One is the data include unsupported data type.

    db.put('store1', {ele: document.getElementsByTagName('div')[0]}, 2)

    Another one is circular referencing.

    cir_array = [1];
    cir_array.push(cir_array);
    db.put('store1', {data: cir_array}, 3);
  2. Write a script that will throw DataError.
  3. Why a function cannot be persisted in the database? How will you re-store an javascript object?

    Function is not persisted in the database because function is not serialized by structured clone algorithm.

    Unlike JSON object, javascript object has methods, which are functions and will not be restored. The technique for restoring javascript object is similar to serialization in other programming language. A static factory method or the constructor that accept serialized object data is used.

  4. How do you create an index to 'article' store to get primary keys sorted by 'first' followed by 'last'?
  5. How will you test a key is existed in an object store?
    The most effecient way to check existence of key is using count method, which return 1 if the key exists in the object store. get method is not only inefficient, since it involves record retrieval, but also fail to distinguish between undefined record value and non-existing key.
  6. Explain indexing is useful for projection query or not? Plan a database schema to facilitate projection querying to 'first' and 'last' field of 'article' store.
  7. How will you query a list of emails from a certain domain, such as 'gmail.com'?
  8. IndexedDB API does not support indexing boolean value. Why? What are the implications?
  9. Describe how listed index can be used for many-to-many relationship.