I went to my first Analytics conference this week here in Dallas and (#GartnerBI) what I wanted to share were just some general impressions around the market and a cool strategy for embracing the change that is occurring in our businesses as it relates to data and analytics.
It’s amazing to me how just 2 – 3 years ago, the concept of a Data Lake or even a Data Scientist really didn’t exist and now you can’t go anywhere in the data space without hearing those words uttered. Don’t even try to mention “Big Data” anymore and expect to be ahead of the curve. Funny thing is, lots of folks are talking about it, but I couldn’t find a large number of folks that are doing it outside a select few. (well not doing it as the industry says we should)
Wow! There are so many tools. Which to me signifies that the market is still emerging. It’s still being defined. I would expect to see a great many of these either become the standard or get absorbed into a larger offering. Everything from data preparation to visualization. You can build dashboards, semantic layers, transformations and extracts with ease. All in a distributed fashion with little IT. Doesn’t that sound fun? Depends on who you ask I guess (sounds kind of cool to me) I’m not really going to dig into vendors etc, but my opinion is that there is so much choice that there is something for everyone. And you don’t even have to go with a big named vendor to get a very robust and enterprise capable solution.
Best I can tell the industry is starting to settle around certain platforms for specific usage. For a Data Lake, Hadoop seems to be the solution of choice. Add on top the things that a distro brings to the table and Hadoop can be a one stop shop for your data ingestion, processing and warehousing needs. Now whether you go with Vendor A or Vendor B, that’s your personal choice. Want rapid, streaming, multi tool access to lots of data, In-Memory processing is the way to go. And I don’t mean the In-Memory capabilities of a columnar data store, I’m talking about Spark or others that can connect to multiple sources, pull into memory and allow that speed and agility that many are requesting. And what I liked about some is that they fit right in with the Hadoop ecosystem while at the same time, could be standalone. As for Visualization … too many to list and not really exciting.
But what was! This thing called Search Based Analytics. There were a couple of vendors showing demos which I totally dug. The idea that a user/customer could type in a search like Google and the platform understands what you are looking for and displays charts and tables based on search criteria about your data. Really fascinating concept.
So many topics and tools were centered around this idea of distributed data prep, data quality and advanced analytics. All done without IT. And more importantly done much closer to the business and data where true value can be realized. Sidestep: I love small companies … always have actually. The ability to make an impact on revenue or a direct contribution towards solving a customer’s problem excites me. This idea of self-service almost puts each department/BU in start-up mode. You can add and provide value on your own without some big hefty BI process.
IT is going to get left behind if they don’t enable people to bring value sooner and cheaper. Our job in technology is not going to always be leading. That’s why I love the word Enabler. A CIO in this day and age could almost be re-branded as a Chief Enablement Officer. Because that’s what we should be doing. Leading is a task, not an identity. (that’s a whole other post)
So I’ll be honest, I’m not huge into tools and software. I’ve often said to folks that I really don’t like computers. I get a funny look being that I’ve made my career off of them and really seem to enjoy them. Then I qualify that with, I like solving problems. And computers and the computational and storage power are wonderful vehicles for solving problems. So tools aside, the one thing I was hoping to learn I actually did. Kudos again to the conference for providing something for everyone.
Gartner has this notion of being Bi-modal when it comes to how an organization approaches BI & Advanced Analytics. On one hand you have this often traditional, heavier, enterprise grade process that gathers needs and cranks out capabilities in the robust traditional BI platform. Think data warehouse with ETL and everything in between. What you get from this is stability, security, guarantees and predictability. All wonderful things. But what you generally don’t have is flexibility and agility. That’s where the modern data warehouse and advanced analytics platforms come in. You can pull all kinds of data out of your data lake/flat file/DBMS/services etc in some kind of self service tool and then apply whatever math or code or discovery on top of that you like to produce some insight. Maybe it’s valuable. Maybe it’s not. But what I like about the approach and thought process Gartner was pitching was this. (to tie this up)
It’s not an either or type scenario. Perhaps the self-service model is just fine for data sources that are too volatile to ever contain in traditional ETL processes. Or perhaps they aren’t as evolving and when the time is right, that set of flows with lineage is migrated into your strongly typed ETL capabilities and loaded up into SQL Server or Oracle or Vertica (insert some other data store) for reports and dashboards. It really depends. And most scarily, it’s up to you as the implementer of this stuff.
I often thought this space was much more prescriptive on how to do things than the world I was used to in application development. But with these new sets of platforms, tools and most importantly PROBLEMS, perhaps that is changing. I used to only care about a DBMS because it’s where my app persisted some data. I often cared even less because an ORM or a custom rolled Data Mapper abstracted it. Now I find myself wondering … perhaps I’ve been missing something.