Key joining

The indexedDB API provides ordered enumeration of records in a given key range. Join queries are left to the application logic to handle.

Filtering is joining on the primary key. The join process will scan index keys, apply the desired filter, and return the matching primary keys as the result. Numerous join algorithms can be found in database books and review articles.

The db.scan database operation method is used for both key and value scanning processes. It accepts a callback routine and array of iterators. Key iterators are usually preferred over value iterators because of the increased performance. The callback is invoked on each iteration and controls the advancement of the iterators. The first arguments to the callback is the array of effective index keys for the current iteration. The second argument is the array of reference values: primary keys if index iterators are used or record vlaues if avalue iterators are used. The callback must return an array where each element refers to the advancement of the respective iterator: true to continue next iteration or false to restart the iteration. If any value is given, it is taken as primary key and advance to it within current index key range. If a primary key larger than given key is found, it was returned on the first occurrence. If the entire return array is empty, db.scan completes.

In general, the callback can return an object having fields of advance, next_index_keys and next_primary_keys. All three fields are arrays, with each element represent to respective iterator. If element of next_index_keys is a valid key, the cursor advance to it. If element of next_primary_keys is given, the cursor advance to it within given index key or current index key. If element of advance array is given, it must be boolean. true refer to next position and false to rewind the iterator. If all three are given, it starts with rewind, positioning and advancement.

Available algorithms can be found in ydn.db.algo module.

Nested-loop join

A naive join algorithm iterates each iterator forming nested loops. And hence it is called nested-loop join algorithm.

The following snippet filters the license field to ‘SA’ and the publisher to ‘Nature’.

match_keys = [];
var nested_loop = function(keys, values) {
  // here: keys = index keys, values = primary keys
  // index 0 refer to license field and index 1 to publisher
  if (keys[0] != null) {
    if (values[1] != null && ydn.db.cmp(values[0], values[1]) == 0) {
      match_keys.push(values[0]); // we got the matching primary key.
    } 
    if (keys[1] != null) {
      return [null, true]; // iterate on publisher
    } else {
      return [true, false]; // iterate on license and restart on publisher
    }
  } else {
    return []; // no more iteration. we are done.
  }
};

license_filter = new ydn.db.IndexIterator('article', 'license', ydn.db.KeyRange.only('SA'));
publisher_filter = new ydn.db.IndexIterator('article', 'publisher', ydn.db.KeyRange.only('Nature'));
var req = db.scan(nested_loop, [license_filter, publisher_filter]);
req.then(function() {
  db.values('article', match_keys).done(function(results) {
    console.log(results);
  });
}, function(e) {
  throw e;
});

This first sets up the nested_loop callback and then uses db.scan with the two KeyRange filter iterators. The callback advances and restarts the iterators in order to make an outer loop of the license index and an inner loop of the publisher index. The license index iterates through only once, advancing only after iterating through the entire publisher index. The result is the same if we flip the loop order since joins are commutative, but run time is not the same. This is the subject of query optimization.

The inner loop result could be cached into the memory. This leads to hash merge algorithm.

Sorted-merge join

In the above example, we notice that the results of both iterators are sorted in ascending order of the primary key since we held the index key constant. If the results of all filters are sorted, we can advance the iterator skipping keys not in the filter. This makes the sorted join algorithm speed independent of number of records in the object store.

match_keys = [];
var sorted_join = function(keys, values) {
  if (keys[0] != null && keys[1] != null) {
    var cmp = ydn.db.cmp(values[0], values[1]);
    if (cmp === 1) {
      return {continuePrimary: [null, values[0]]}; // move publisher to match license
    } else if (cmp === -1) {
      return {continuePrimary: [values[1], null]}; // move license to match publisher
    } else {
      match_keys.push(values[0]); // we got the matching primary key.
      return [true, true]; // move forward both
    } 
  } else {
    return []; // no more iteration. we are done.
  }
};

license_filter.reset();   // reusing the filters from the previous example
publisher_filter.reset();
var req = db.scan(sorted_join, [license_filter, publisher_filter]);
req.then(function() {
  db.values('article', match_keys).done(function(results) {
    console.log(results);
  });
}, function(e) {
  throw e;
});

There is no nested loop; both indexes are iterated through once. Notice that the sorted-merge join outperforms the nested-loop join by an order of magnitude. Using a naive nested-loop join in a practical web app is catastrophic.

We can limit the result size and exit the loop by returning an empty array to the iteration callback. Iterators are resumed if used again, which is how paging is handled in scanning. It is also why we needed to reset the iterators after they we used int the nested-loop join example.

The result of sorted-merge join is sorted by primary key. To sort by a specific column field other than primary key, a zigzag-merge join algorithm is used.

Zigzag-merge join

In sorted merge join, effective key is held constant while joining sorted list of values. In Zigzag merge join, the effective itself comprise both parts. The first part, known as pre-fix is held constant consisting what we want to be filtered. The second parts, known as post-fix is sorted value. The composite index is used to construct post-fix and prefix as required by the query.

To query SELECT * FROM article WHERE license = 'SA' AND publisher = 'Nature' ORDER BY 'title', we construct two iterator using composite index of ['license', 'title'] and ['publisher', 'title']. ‘title’ is the postfix we are joining. The ‘license’ prefix is held constant to ‘SA’ and ‘publisher’ prefix is held constant to ‘Nature’ as follow.

var license_sa_iter = ydn.db.IndexIterator.where('article', 'license, title', '^', ['SA']);
var publisher_nature_iter = ydn.db.IndexIterator.where('article', 'publisher, title', '^', ['Nature']);
var match_keys = [];
var solver = new ydn.db.algo.ZigzagMerge(match_keys);
var req = db.scan(solver, [license_sa_iter, publisher_nature_iter]);
req.then(function() {
  db.values('article', match_keys).done(function(results) {
    console.log(results);
  });
}, function(e) {
  throw e;
}); 

References

Authors

Kyaw Tun