Google, as you might expect, has massive amounts of data and it’s built many tools to handle it. Stuff like MapReduce and GoogleFS, which spawned the open source Apache Hadoop, and BigTable, which spawned Apache HBase.
But Google didn’t stop with those projects. It’s continued to create new big data tools and continues to publish papers about them. Dremel (PDF) is designed to make querying the huge data sets stored in GoogleFS and BigTable much faster. Where a MapReduce job on Hadoop could take hours or even days, Dremel makes results available almost instantly.
Apache Drill is an attempt to build an open source version of Google Dremel, and the project was recently accepted into the Apache Incubator program. It’s supported by MapR, a company that sells a modified version of Hadoop with proprietary customizations.
There are other open source real-time big data systems, notably Storm, which was…
View original post 235 mots de plus