Big Data: Challenges and opportunities
Interview with Kim Gregers Petersen, Big Data & Analytics expert
Big Data is on everyone’s lips nowadays as a collection of technologies that can change the use of data the world over and in almost all types of businesses. Challenges, say some people. OPPORTUNITIES, says a Danish expert.
Recently there was a job ad at SKAT (the Danish tax authority). SKAT was looking for an assistant director to spearhead a brand-new department dedicated to ’Business Intelligence and Analysis’. The new department would, among other things, contri-bute to SKAT’s overall efficiency and further develop data models ”where the use of Big Data will be a natural part of activities”.
“I read the advertisement as a sign that not only companies, but also the government, have begun to work seriously with Big Data,” says Kim Gregers Petersen, Big Data & Analytics expert. As a consultant within Big Data solutions, he notices an interest building up day by day because companies and organisations like SKAT (the Danish Tax authority) are constantly seeing new opportunities to analyse their growing piles of data. ”This is an area of explosive growth, and it’s all about getting on board right now,” says Kim Gregers Petersen.
By way of introduction, and as a starting point for a discussion of Big Data, Kim Gregers Petersen sums up developments in this area over the last 10 years with four facts:
Fact 1. The world’s data volumes are increasing at a pace that far exceeds our wildest dreams.
Fact 2. The world’s growing data volume is not just a quantitative challenge, since the data originates from new sources, such as video, photography, audio,
navigation systems and instant messaging.
Fact 3. The new types of data are often unstructured and therefore require very different handling technologies to those we are accustomed to.
Fact 4. These technologies are still so new that many in the branch find themselves at a crossroads. On one side, they recognize that much of this new technology will shape their professional future, but on the other, they are not familiar with these techknologies as many are so new, and they don’t feel it would be possible to gain the needed expertise in these areas.
”In rough terms, this is how things look right now,” says Kim Gregers Petersen. ”Of course, the question is: What is to be done?” he adds.
Highly interesting for the business
We will get back to the answer to that question. First, Kim Gregers Petersen explains what he defines as Big Data.
”If we take a hypothetical example, a business has data corresponding to 100 %. If you ask the vast majority of companies how much of the data they use in their daily business, they will answer 15-20 %. The remaining 80-85 % of the data is not used, for various reasons. They just store the data, because they have to, or because they do not know how to use it. The whole point of Big Data is to activate as much as possible of the 80-85 % inactive data, so it can contribute to the business,” says Kim Gregers Petersen, giving an example.:
“Let’s take a business that sells computers. The sales department keeps good track of which computers they sell to which types of customers, their profit margins on the various computers and the price development in the various product categories, etc.
In the marketing department, they are good at contacting new and existing customers with offers of promotions, seminars, etc. And in customer service they are good at helping angry customers who call in and complain about a particular product. The point is that the data gathered by the various departments is never combined. It might be interesting for marketing and
sales to know that customer service has handled 78 complaints about the same computer within a week. Today, that information is lost, because businesses don’t have the systems to coordinate this data.”
Big Data is screaming for manpower
It is first and foremost corporations who should be concerned about all these many new possibilities and technologies as they don’t have the corresponding competences within their organizations.
”This field is screaming for manpower,” says Kim Gregers Petersen. ”If I were 20 again, I would hurry up and run in that direction.
For many years, being a programmer hasn’t been very popular, one reason being that ERP solutions and Exchange solutions have been given an elegant administration layer makes them relatively easy for ordinary IT people to handle. In other words, it’s become a bit boring to ’just’ be a programmer.
But with all the new Big Data technologies – most of which come from the open source community – it’s suddenly cool to be a programmer again. We do not see the super-hot interfaces in the new products that we know from mature technologies. Big Data is a bit more hardcore.”
As Kim Gregers Petersen explains, it is not yet possible to take the formal route if you wish to train in the field of Big Data, since this is not offered at Danish colleges and universities. ”This is actually the biggest hurdle preventing the expansion of Big Data right now,” says Kim Gregers Petersen. ”But I suppose it’s related to the fact that technology is so new that the educational
system has not been able to keep up.”
The Big Data environment
With a generic model of a Big Data environment in front of him, Kim Gregers Petersen outlines the long journey that the data takes, from the first knock on the company’s door, such as Twitter, video or telecommunications data, to its final appearance as e.g. BI reports. During the journey, the name Hadoop pops up. According to Wikipedia’s definition, Hadoop is ’an open-source software framework for the storage and large-scale processing of data in large clusters that run on commodity hardware’. Kim Gregers Petersen describes Hadoop as a key component of many of the largest Big Data environments in the world.
”The great thing about Hadoop is that it acts as an infinite number of buckets into which you can pour both structured and unstructured data. You may wish to analyse some of the data immediately, while other data may not be analysed until after three years, when this is more relevant. Hadoop was created to meet these and many other requirements,” states Kim Gregers Petersen.
”I recommend that you take a closer look at Hadoop and all the technologies that comprise Hadoop. I say this for several reasons, including that never before have large commercial enterprises had so much at stake in an open source environment. For example, Hadoop represents the backbone of the IT systems of Yahoo, Twitter, Netflix and Facebook, and they will do everything to ensure that Hadoop gets better and better.”
He can barely bring himself to mention the case, because it has received so much media attention, but Kim Gregers Petersen mentions in passing Vestas’ large Hadoop installation and how they are able to make almost real-time simulations for the location of new wind turbines, in order to demonstrate the potential of Big Data and Hadoop. In another, less known example, Sweden’s Royal Institute of Technology (KTH) – is
using IBM’s streaming technology STREAMS for traffic monitoring in Stockholm. A variety of data sources, such as vehicles’ GPS signals, alarm messages from traffic control, sensors on the roads and weather data, help direct traffic to flow as smoothly as possible.
The logic is that, no matter which industry, any business of a certain size could benefit from Big Data?
“Exactly. But this requires creative thinking, and that you know the technologies,” concludes Kim Gregers Petersen.
How to Get Started with Big Data
Big Data is many different things and there are probably no two clients having identical problems and issues. Below however, is an excellent road map for getting started. Try to approach Big Data as a shared vision and task; the business side working together with IT
- Look at current data and assess how more value can be extracted. Use a tools such as IBM Watson Explorer to do this
- Do you have performance issues with your SQL databases, and could it make sense to move your data into a scalable NoSQL database?
- Are you currently using BI, and could make more data
- available for BI tools with a Big Data environment?
- Do you already have ”data scientist” profiles employed? Do you have existing employees who are enthusiastic about the area who can be ”upgraded”?
- Choose a software package with tools and Hadoop, such as those supplied by IBM so you can get started faster and get/show results
- Identify consultants (for example ProData Consult), who can both advise on the business side and assist in the implementation and programming