Questioning the NoSQL developer productivity myth

I hope this excellent article by +Kelly Sommers ( @kellabyte ) helps dispelling the (still ongoing) silliness about programming to NoSQL being more productive than to SQL databases:
This recent presentation about the 1st ThoughtWorks Radar of 2012, by Sam Newman ( @samnewman ), although quite interesting (and contradictory, and questionable... but sure interesting), added a bit to that "NoSQL being more productive" silliness:
Again, learning a NoSQL API might be easier than learning SQL, but SQL is a farking DSL, a good one at that and, as such, gives you a lot of power.

During the Jurassic period when SQL databases did not fit into Personal Computers I did work a lot with NoSQL databases. I sure remember how painful it was and how I got much more productive with the SQL ones.

If you do not think a DSL is needed to make programming data manipulation tasks more productive, just take a look to what happens with Apache Hadoop and Pig Latin:
Or UnQL (which sounds like a still-in-denial name):
Yes, one size does not fit all but SQL databases - or just NO databases at all (*) - are much more productive for most jobs than NoSQL DBs.
(*) It is often possible to just keep all the data in memory and serialize it all as one big doc.

And even after this NoSQL-later-years trend started, this discussion, about history repeating and lessons from the past not being learned, is already ongoing at least since 2009 (check referenced articles too):
If you are a NoSQL fan and think I am just not aware of NoSQL databases usefulness, please reconsider:
What is new about NoSQL databases is NOT the absence of a query language - that is very old news.

What is new about the new vague of NoSQL databases, is the way some of them are targeting Big Data and the new strategies they use to deal with redundancy and consistency, especially for high volume + high concurrency scenarios.

Having a query language or not is mostly orthogonal to consistency models - although it might somewhat affect the query language just like it affects the whole database use model.

What I am questioning here is the claim about higher programming productivity of NoSQL databases for those most common cases where the SQL databases or no database alternatives would do.

Remember: most people are neither building another Twitter nor another Google.

So, again, what I am having trouble to understand is:
  • How is NoSQL so much more productive for others than the alternatives mentioned above? That completely contradicts my experience.
  • Why are query languages (with some SQL likeness) popping up if NoSQL database APIs are the real deal?
  • Or how is this silliness still going on since 2009?
Sorry if I am coming trough a bit rough, but I sure am a bit opinionated about the NoSQL vs. SQL theme.


Logging (and Exception Handling)

During the last year, the exception handling and logging themes have been quite recurrent, even leading to a talk I gave at Codebits 2011.

Since the theme is still coming back to me, both at work and online (this time via Sergio Bossa), I decided to dump here some of the information I have been collecting and producing.

For starters, this is an enriched version of the slideware I used for the above mentioned Codebits talk:
Exception+Logging=Diagnostics 2011
(Other of my presentations can be found here.)
As with my previous presentations, if you download it, you will find some extra information (including the URLs of the most important sources I used) at the slide's comments. (I am sorry, but SlideShare does not seem to be able to display the comments from the Powerpoint format I used this time.)

Anyway, my favorite articles and presentations about logging are currently here:
Naturally, I do not agree with everything written in the above mentioned articles. Well, give it some time and I might not agree with everything I wrote on my own slideware...

I believe, however, these articles to be useful fuel to help anyone, with enough critical thinking, to build their own paths.

Having to talk about logging helped me to step further away of the self centered view most developers seem to have about logging. I had already taken a few steps out of there by working together with sysadmins and other colleagues, supporting the larger applications I worked on and sharing their pains. But making a presentation always forces me to have a much deeper reflection about the presentation's subject.

That is why my slideware focuses on the communication role of logging, as do several of the above mentioned sources.

Instead of developing this blog entry further, just read with attention the above articles. Believe me: must of us have a very bad logging communication... Poor sysadmins!


My "Distributed Programming and Data Consistency" Codebits talk slideware

I am finally publishing the slideware for my SAPO Codebits 2009 talk entitled "Distributed Programming and Data Consistency".

I first wanted to complete its notes in order to:
  1. Allow its use independently from the talk's video (in Portuguese);
  2. Provide enough bibliography / reference material to ease further study on any of the covered issues.
Here it is:


Codebits 2009

I finally went trough the Codebits experience:
I ended up having the chance to have a rather complete experience too, not only attending but also by giving a talk and competing in the famous 24 Hour Programming Contest.

Not everything went perfectly, but it was an amazing experience.

Please believe that I don't use the word "amazing" as liberally as those people you often watch on TV. This was really interesting and I want to repeat it next year, if I have the chance.

The title of my talk was "Distributed programming and data consistency" and there is a video (Portuguese spoken) here:
The talk mentioned several NoSQL data handling techniques, but the focus was mostly on Data Consistency, CAP and BASE / Eventual Consistency.

After watching me back I just found out how much I have to improve as a speaker. The feeling I have is like:
  • The guy which studied the subject and prepared the slides did a good job;
  • The guy which presented it must be "a bit" livelier,  have better tempo.
I will try to watch this video again before I do any other talk... but in case it goes offline I took notes too (with all the gory details I am avoiding to mention here). =;o)

In the next days I will prepare a version of the slideware with better notes, which will include the URLs of the most interesting literature I came across about this subject.


The last NoSQL conferences...

I spent part of the weekend taking a broader look at the NoSQL scene, given the flood of information resulting from these last two conferences:
I mostly went trough some of the notes already online, harvesting related information and updating my recently reactivated delicious.com account.

Most interesting summaries and other resources I found from these events:
I also ended up finding some interesting older notes about another nosql meeting:
The NoSQL Hype

As usual, the "NoSQL" moniker keeps being a bit irritating to me. Some of the stores qualified as "NoSQL" only aim at being efficient DHTs.

But those that aim at being databases end up having some kind of query language, although often a really poor one - which means query languages have their use...

And nothing prevents you from using a sharded (even with mirrored shards) MySQL database as a key store - and many companies are doing just that.

I am NOT against innovation. My problem with many "NoSQL" solutions is that:
  • They do NOT present enough progress while ignoring lessons from past experience;
  • They pretend to innovate some backend aspects while sending you back in time on the API / interface side;
  • They pretend to be revolutionary while doing nothing better that well known commercial solutions (e.g. Coherence).
If you just want to implement / defend / promote an open source version of a previously existing concept, be modest and don't pretend to be part of a revolution.

History Repeating

So, while there are many interesting Open Source solutions qualified as "NoSQL" popping up, there is too much hype around the "NoSQL" moniker. Too much old stuff dressed in new clothes and too many lessons from the past being ignored.

Like MVCC, which is being considered by some databases as the magic bullet of data merging, while the lessons learned about its limitations (by communities like the Interbase or Firebird users) are being ignored. This article, for instance, documents how MVCC does not detect data inconsistencies when merging two changesets if the inconsistency is related to data that is only read (resulting in data written to another row/table):
Besides, Key/value stores are VERY OLD NEWS.

Some old stores are still around, proofed by the years of use and still damn efficient, like Berkeley DB.

But I used even older key/value stores, like the initial versions of c-tree or the Turbo Pascal Database Toolbox from Borland. What I can tell you, from experience, is:
  • It is much more productive to work with SQL (but you need to learn it);
  • SQL is not just a programming tool. It is a precious database administration tool too.
Sure, standard SQL is too limited for the new database architectures. But that just means we probably need a new QL (Query Language).

So, the revolution is not having NoSQL but having better databases and a Better SQL.

Don't forget how many persistency layers already tried to kill SQL and failed miserably. Sure, they are OK for basic CRUD tasks, but most applications end up needing much more than that.

To put some hype back on SQL, lets remember how it is truly a DSL.

NoSQL interfaces being more productive that using SQL for all data manipulation is the dream of those who are too lazy to learn SQL. But then it is just a dream.

Cutting the Hype

Since what else I have to say was already said, I will just point to the other guys: