We need to focus on Data Basics before embarking on Big Data

bigdataI got incredible response from this post on LinkedIn, with more than 3,000 views, 224 likes but more importantly 27 comments. You can read the comments here at LinkedIn:

Here is the post in its entirety, I’d welcome more comments and discussion here also…

With the proliferation of software-as-a-service applications across most organisations, it is likely that many organisations are suffering from a fragmented data environment. This is a problem because just at the time that most organisations need to homogenise their data strategy to take advantage of Big Data learnings, the opposite is happening: data decentralisation and even chaos.

In many cases, organisations have been focussed on data storage and not data quality. Just managing the demanding growth of data volumes for the last 15 years has been enough of a challenge for CIOs. Rapidly scaling data storage infrastructure – including software and networking as well as hardware – has been overwhelming and all too often the actual quality of the data has not been good. How many companies can genuinely claim their database was sound, that their CRM data was clean and that the insane complexity of spreadsheets was under control let alone consolidated? The age old adage “garbage in, garbage out” scales in severity with the size of data volume.

Yet as data storage now decamps to the cloud and the focus moves to Big Data strategies, it seems that data quality is still not a priority. I wonder if the industry – here in Australia as well as globally – is doing enough to enhance the human data skills rather than relying solely on Hadoop et al to do all the work. I’ve written before on the disconnect between data technology and human data skills. There is a lot of talk about “Data Scientists” but is that nothing more than just a fancy title for BI analysts?

Bona fide Data Scientists are like real life scientists. They have a hypothesis, they test this hypothesis againsts different sets of data and validate or disprove their hypothesis. Then they try and look for further causation, correlation and then they might come up with some real insight and a discovery. But in our a commercial situation, the data scientist might invest a lot of time in developing a hypothesis but then find that the data isn’t available or is too messy to use. So what then? (It is worth reading this New York Times story on “Data Wrangling”).

Organisations need to work out – strategically and operationally – how to collect data appropriately, what data they need and then what they might need to look for. There are data scrubbing tools, deduping tools and analytical tools but if the raw data is not in an appropriate state, obviously it isn’t possible to scrub or dedupe data that doesn’t exist.

So it is crucial for CIOs to look initially at their overall application architecture and work out the data flows and how they integrate, and then what insight we might need and operationally what data is needed and where it can be sourced. This isn’t difficult but it requires formality and strategy rather than ad hoc evolution. The current trend in SaaS proliferation and services bought ad hoc on the credit card at the departmental level is haphazard and making data increasingly difficult for CIOs to manage. Not only because the data is decentralised, in different clouds, but because there are now different data models that are often quite difficult to access and often quite complicated to understand.

If organisations want to truly benefit from the Big Data opportunity there needs to be some sober and disciplined thoughts about data analysis skills, data quality control and data strategy before the kind of frantic technology acquisition that the media and vendors promote and discuss. Otherwise we are going to get no closer to any kind of data optimisation than we are now – we will just create more data mayhem and the Return on Investment will remain just as elusive.

Picture credit: bigdatapix.tumblr.com


Forget “big data”, we need smart data

Word Cloud "Big Data"While the Technology industry is a very exciting space to work in and has always afforded me incredible opportunities in my working life, it often frustrates me also.  I feel the recent hype around big data is a worrying example of this – we can over-simplify things and therefore understate the value we bring to the marketplace.  But beyond that, too often we give the market the impression that our solutions are a silver bullet when in fact that there is far more to a problem than just throwing some technology at it. I made these points in The Rust Report this week – please do let me know what you think…

“It seems the IT industry has done it again.  Too often in the past the Technology Sector has created a huge bandwagon onto which every marketeer in the industry has jumped without anyone scrutinising the actual value being proposed, or making the case with any depth.  Pretty soon customers see through the empty and shallow proposition and the bubble bursts, leading to a backlash, job losses and diminished credibility.  I’m very concerned history is repeating itself.

Think Y2K, or RFID, or even Cloud.  While all of these propositions were more than valid, the huge bubble in vacuous marketing hype left the customer feeling short changed.  With Big Data I feel the case has not been made correctly.

Anyone who was at last year’s CeBit Keynote in Sydney would remember Obama CTO Harper Reed’s confronting comment – and I paraphrase – “Big Data is bovine excrement”.  His point was that the “Big Data” challenge was a storage one – back in 2007.  Today that challenge is solved.  The real challenge of data today is how to derive real value from it, not where to put it.  Today the industry is busy selling solutions to the wrong problem.

“When I hear Big Data, I immediately hear marketing” said Harper.  “All these great brands, they’ve really jumped into this marketing world of talking about the problems that are pretty much solved.”

We don’t need Big Data, we need Smart Data. There’s data tracking everything possible from our physical movements, to our buying and trading habits, to our relationships and thoughts.  But while GPS, Social and mobile surfing behaviour data are all religiously monitored, everyday products and even whole businesses are failing.  We’re busy collecting all this data, but we aren’t learning anything from it.  This is what the industry needs to solve.  As many others have pointed out, with all the trading data and predictive models available, we still didn’t see the GFC coming.

When was the last time you heard anyone talk about Radio Frequency IDentification (RFID)?  It was all anyone could talk about in 2006, but after everyone put tags on anything that moved, they derived no value from it because no one figured out where to put all the data they created, let alone how to learn anything from it.  As a result, ROI was minimal.  I’m having deja vu with Big Data.

The first problem is talent.  The news is currently peppered with surveys that identify a spectacular gap between the Big Data skills required and the talent available to meet those needs.  What are Australian Universities and Governments doing to solve these problems?  It remains the case that, used and educated effectively, the best computer we have access to is still the human brain.  But we don’t have enough people who know how to ask the right questions – query the data – let alone how to extract the answers.  This requires a unique blend of mathematics, psychology and business acumen.  I am not sure Australian universities are investing enough in cultivating these skills and businesses don’t know where to find them.

Equally, where are the smart algorithm apps that help smaller companies extract value from the data they have stored in databases like Oracle or Salesforce. A good example is SalesPredict in the US, which recently raised $1 million in seed funding to help small companies predict which sales leads are more likely to convert.  Advanced Pattern Discovery is a another tremendous application of Big Data but it requires very clever modelling software to produce.   This is a huge area of potential innovation Australian entrepreneurs could be investing in – with Government help – but I’m not hearing about them.  I would certainly like to see our local industry investing more in this area, far more than in marketing hype around big storage boxes.

Software doesn’t solve the big data issue and as the old saying goes, “a fool with a tool is still a fool”! It is crucial to consider the strategy that drives where to look for answers, which data sets to use and how to get the correct insights. This process is performed by humans, the technology just facilitates it. The gap we have is the human gap.  What is the industry in Australia doing to fill that gap?

As the nation furiously debates the future of education and skills development in both the TAFE and University spaces, this is an important new skill set that business desperately needs more talent in.  I would like to hear more discussion about how Government and Industry can invest in this development – much more than I want to hear about the latest “Big Data” solution.”