A young character named Ferris Bueller once said, “Life moves pretty fast. If you don't stop and look around once in a while, you could miss it.” Although relevant to life, it’s also relevant to what’s happening in the data management space for Hadoop. You don’t need to skip your Hadoop certification class to appreciate how extraordinary and fast this industry is moving, but if you slow down just a bit you can get a glimpse into what’s happening.
Let’s talk about three key elements that drive data management for Hadoop. First, I apologize in advance, but I have to say the phrase that is all too familiar… anyone, anyone… “big data.” New data paradigms are exploding and driving changes in data management practices. For most companies, big data is the reality of doing business. It’s the proliferation of structured and unstructured data that floods organizations on a daily basis – and, if managed well, that can deliver powerful insights.
Second, new ways of thinking around analytic design are emerging. Driven in part by the millennial generation and gaming mentality, Design involves the use of all available tools in order to experiment, innovate, and create new techniques and approaches to data and analytics, as well as refining the art of data-driven decisions.
Third, analytic deployment is the mature analytic framework that places significant value on putting the analytic process into production. Design is cool and necessary for innovation, but creative concepts need to be turned into cost savings, profit, or risk mitigation to add real value to the organization.
When you combine the art of analytic design, the application of deployment, and fuel them both with massive amounts of complex data, you get the new analytics culture. As the analytic needs of this culture grow and change, so do their data management needs.
My colleagues and I see data preparation for the new analytics culture distinctly different from traditional data warehousing. Data warehousing techniques, and many of the tools that support them, are designed to conform data into standard schemas that are well organized and optimized for building efficient queries. The tools and processes are designed for the back office, used by a data management specialist, for the purposes of giving a finished dataset to analytic and reporting users.
Unfortunately this process falls short of providing what the end user really wants, and ultimately forces a scarce resource to perform all kinds of pre-analytic data management magic to do their job. In fact, it’s commonly understood that 80% of a statistician’s time is spent preparing the data, and subsequently re-working the data as they move through the analytic lifecycle. This disconnect between the people and technology is worth a look. More particularly, it comes with the following challenges:
One SAS technology that can help give back some of this 80% of lost time to the new analytics culture is SAS® Data Loader for Hadoop. This easy-to-use, self-service tool works inside the Hadoop Platform to enable:
By providing sophisticated data management capabilities to both the design and deployment cultures, analytic people can spend more time developing innovative models, and less time working on their data, all inside the Hadoop Platform.
The world of analytic data management is moving pretty fast. I can’t say you will earn a day off by giving the analytic teams more time to focus on modeling (by simplifying their data management processes), but it will certainly make you a hero!
Take this Ferrari for a spin by visiting the SAS® Data Loader for Hadoop web page.
Also, follow the Data Management section of the SAS Communities Library (Click Subscribe in the pink-shaded bar of the section) for more articles on how SAS Data Management works with Hadoop. Here are links to other posts in the series for reference: