My "Distributed Programming and Data Consistency" Codebits talk slideware

I am finally publishing the slideware for my SAPO Codebits 2009 talk entitled "Distributed Programming and Data Consistency".

I first wanted to complete its notes in order to:
  1. Allow its use independently from the talk's video (in Portuguese);
  2. Provide enough bibliography / reference material to ease further study on any of the covered issues.
Here it is:


Codebits 2009

I finally went trough the Codebits experience:
I ended up having the chance to have a rather complete experience too, not only attending but also by giving a talk and competing in the famous 24 Hour Programming Contest.

Not everything went perfectly, but it was an amazing experience.

Please believe that I don't use the word "amazing" as liberally as those people you often watch on TV. This was really interesting and I want to repeat it next year, if I have the chance.

The title of my talk was "Distributed programming and data consistency" and there is a video (Portuguese spoken) here:
The talk mentioned several NoSQL data handling techniques, but the focus was mostly on Data Consistency, CAP and BASE / Eventual Consistency.

After watching me back I just found out how much I have to improve as a speaker. The feeling I have is like:
  • The guy which studied the subject and prepared the slides did a good job;
  • The guy which presented it must be "a bit" livelier,  have better tempo.
I will try to watch this video again before I do any other talk... but in case it goes offline I took notes too (with all the gory details I am avoiding to mention here). =;o)

In the next days I will prepare a version of the slideware with better notes, which will include the URLs of the most interesting literature I came across about this subject.


The last NoSQL conferences...

I spent part of the weekend taking a broader look at the NoSQL scene, given the flood of information resulting from these last two conferences:
I mostly went trough some of the notes already online, harvesting related information and updating my recently reactivated delicious.com account.

Most interesting summaries and other resources I found from these events:
I also ended up finding some interesting older notes about another nosql meeting:
The NoSQL Hype

As usual, the "NoSQL" moniker keeps being a bit irritating to me. Some of the stores qualified as "NoSQL" only aim at being efficient DHTs.

But those that aim at being databases end up having some kind of query language, although often a really poor one - which means query languages have their use...

And nothing prevents you from using a sharded (even with mirrored shards) MySQL database as a key store - and many companies are doing just that.

I am NOT against innovation. My problem with many "NoSQL" solutions is that:
  • They do NOT present enough progress while ignoring lessons from past experience;
  • They pretend to innovate some backend aspects while sending you back in time on the API / interface side;
  • They pretend to be revolutionary while doing nothing better that well known commercial solutions (e.g. Coherence).
If you just want to implement / defend / promote an open source version of a previously existing concept, be modest and don't pretend to be part of a revolution.

History Repeating

So, while there are many interesting Open Source solutions qualified as "NoSQL" popping up, there is too much hype around the "NoSQL" moniker. Too much old stuff dressed in new clothes and too many lessons from the past being ignored.

Like MVCC, which is being considered by some databases as the magic bullet of data merging, while the lessons learned about its limitations (by communities like the Interbase or Firebird users) are being ignored. This article, for instance, documents how MVCC does not detect data inconsistencies when merging two changesets if the inconsistency is related to data that is only read (resulting in data written to another row/table):
Besides, Key/value stores are VERY OLD NEWS.

Some old stores are still around, proofed by the years of use and still damn efficient, like Berkeley DB.

But I used even older key/value stores, like the initial versions of c-tree or the Turbo Pascal Database Toolbox from Borland. What I can tell you, from experience, is:
  • It is much more productive to work with SQL (but you need to learn it);
  • SQL is not just a programming tool. It is a precious database administration tool too.
Sure, standard SQL is too limited for the new database architectures. But that just means we probably need a new QL (Query Language).

So, the revolution is not having NoSQL but having better databases and a Better SQL.

Don't forget how many persistency layers already tried to kill SQL and failed miserably. Sure, they are OK for basic CRUD tasks, but most applications end up needing much more than that.

To put some hype back on SQL, lets remember how it is truly a DSL.

NoSQL interfaces being more productive that using SQL for all data manipulation is the dream of those who are too lazy to learn SQL. But then it is just a dream.

Cutting the Hype

Since what else I have to say was already said, I will just point to the other guys: