Depending on where you are in the big data journey, you are probably either thinking ‘what on earth should I be doing?’ or ‘what’s all the fuss about – we’re already doing it as business as usual’.
The aim of the post, based on a presentation given to the UK Oracle User Group Analytics community recently, is to understand why all the hype and then cut through this hype to see how some of the ideas might become helpful for more than the largest retailers and utilities companies.
I may be a little controversial, but I only do this to engender debate. It would be sad if we missed a great opportunity to provide value to our organisations simply because we got fed up with the hype. I’m not shooting down in flames what others have said before me. I am simply trying to create a more pragmatic discussion
As an independent consultant – poacher turned gamekeeper – having worked 25+ years in the Oracle and Oracle partner world, I feel qualified to speak. I also implemented a ‘big data’ solution 10 years ago for a telco, when the term was scarcely used. Research is part of my paid and unpaid job and I have a healthy cynicism for big vendor noise.
What is the goal of Big Data?
Use increased availability of data, analytic tools and analytic skills to make better business decisions
Who’s actually doing it?
Mainly web corporates and major retailers, lots of point solutions in marine research, healthcare government organisations, charities etc.
McKinsey making ridiculous claims
- 60% operating margin improvement
- 8% reduction in US healthcare costs
IBM and Oracle are not blameless either.
The Oracle Story
However, at least Oracle has a developing story.
Oracle already has mature tools that can help answer key questions, but maybe not all questions.
Oracle has a partnership with Cloudera, but this is a little strained, as Cloudera has announced plans to reach into the traditional Oracle BI space.
Rather than give you a checklist, much better for you to look at your existing estate, determine what you can and can’t do and look at the whole market, including for instance Oracle and Cloudera but also Oracle and data mining, Endeca and Oracle Exalytics and Hadoop. And I think it will change again within 2 years.
What are the primary concerns?
From a recent TDWI survey
- Big data expertise is scarce and expensive 38%
- Data warehouse appliance platforms are expensive 33%
- We aren’t sure how big data analytics will create business opportunities 31%
- Analytical tools are lacking for big data platforms like Hadoop and NoSQL databases 22%
- Our data is not accurate 21%
- Hadoop and NoSQL technologies are hard to learn 17%
- We don’t have enough data 13%
- Hadoop and other NoSQL technologies lack management features 12%
- Other 2%
The Financial Times in the UK has gone a bit nuts about this, publishing a number of critical articles about the hype
Many analysts are now starting to make a noise.
“Big Data is Dead” - John de Goes
“Every so often a term becomes so beloved by media that it moves from ‘instructive’ to ‘hackneyed’ to ‘worthless’,and Big Data is one of those terms….” Roger Ehrenberg
“Every product by every vendor supports big data… and every ‘industry leader’ with every talk needs to include the phrase in the title of their talk and repeat it as many times as possible. So every data warehouse pitch is rehashed as a big data pitch, every data governance, master data management, OLAP, data mining, everything is now big data.” Rob Klopp
So what has gone wrong?
People need to hear about the successes and failures and need to see similar issues to their own being resolved through big data projects
All the vendors have their own nuanced definitions for big data. For some, it is the whole data platform, while for others it is the data which cannot be analysed by traditional means. Because the centre of this world is Open Source Hadoop, their marketing engine isnt as well-oiled as those of the big vendors.
Implementing big data can be a complex exercise, involving multiple technologies, not all of which talk to each other effectively. There is no one size fits all.
POCs are most effective when developers understand the software. There is always a learning curve when deploying new technology. I remember when Java and JDeveloper first came out – seemed like such a backward step after Oracle Designer and auto-generating forms.
Delegating responsibility to the data scientists has put too much pressure on and influence with people who typically don’t understand the organisation, warts and all. This makes it harder to gain early value from the technology.
How can we expect organisations to launch successfully into big data analytics if they have not yet got their BI right – which is the place of most mid-size and some large organisations.
Fifteen years ago, all our investment was in custom build, 10 years ago in applications integration. The first was overtaken by prebuilt applications and the second by SOA. Buyers and developers are all wary of more changes.
The thought of analysing someone’s Facebook activity sends shivers up many people’s spines and has rightly raised concerns about privacy and security.
There is I believe an implicit assumption in the prescribed big data approach. It sounds great that you can allow your business uses to explore the big data without the need to define requirements, but is this really practical? Can you imagine even a very competent analyst being given a complex and vast dataset and uncovering truths. I am being deliberately simplistic, as there are tools that will help, but I do think this is a key concern.
There is clearly value somewhere in the story – we cannot and should not throw the baby out with the bathwater, so
What are the practical use cases?
- Customer/Product/Market 360o view for retail, insurance and third sector
- Intelligence, security and fraud detection for government, telcos and insurance
- Operations analysis through sensors to support automotive, transport, manufacturing and utilities
- Rapid aggregation and analysis of data to support clinical decisions in healthcare and pharmaceuticals
- Ability to pool structured and unstructured data supports marketing and research companies
If your industry is not on here then there will still be a use case
Big Data is a big part of the future of IT
As everything becomes a commodity, including SAAS, BI and technology management, IT needs to find new questions to answer – create new value.
You cannot build a big data solution unless you know your overarching information strategy – what does my data mean, which data will provide the greatest value, where can it be sourced, how can it be visualised, analysed and communicated, where is it duplicated.
With an understanding of the strategy, move on to the business case, which may not involve big data at all (is predictive analytics big data?)
Perhaps because of the hype, expectations are typically very high in the business. Sell this as part of the overall MI solution, perhaps using some new technologies.
Even if the solution looks like big data, can your existing tools deal with it? When we built a big data solution 10 years ago for O2, we didn’t have Hadoop or NoSQL, so we just found a way to make it work with Oracle database and Siebel Analytics. Existing technologies have leapt forward since then, for instance Exalytics and in memory solutions, as well as advanced database facillities.
If you reach this point with management buy-in and the solution demands a technology change, then spend time assessing the options and build an end to end strategy. Don’t just take the first step, as there will be no deliverable. Time to set expectations again.
A number of successful deployments have utilised a 3 platform strategy to allow for different optimisations – archive for best storage, discovery to unleash the data scientists and deployment or production analytics for the ongoing BI representaton of valued information, whether gained from BI or Big Data.
This is when it is vital to consider all your skills and skills gaps. A lot of this can be re-learned if you have competent skills now, but you will probably need some support. If you outsource work of this nature, make sure you agree what success looks like. They may be learning at the same time.
Many big data projects fail because they meander and cannot deliver value quickly. Eventually sponsors lose interest and team members are diverted onto other tasks perceived to be more pressing. Create a clear set of deliverables and milestones, even if you don’t know what the deliverables will look like. Work using an Agile approach with frequent checks. Preferably work on a ‘traditional’ and new project at the same time, as there may be too many dependencies to manage a new technology project efficiently, so a rebalancing every now and then may be practical.
The FT recently published an article which castigated the IT industry for setting an expectation that big data technology would solve all their customer relationship problems. It noted that people on the shopfloor or production line or working face to face with citizens would be able to determine trends and understand sentiment just as well as highly tuned computers, sometimes better if the parameters are not known by those computers. Don’t expect the data science graduates to know everything!