Welcome to the SGI Blog
What's the big deal about big data? (eBook available)
BLOG| Evolving world of work
Big data seems to be all the buzz, but the truth is that this information has always been around - it was just an untapped resource. With every company wanting to gain a competitive edge, analysing data can allow anyone to cater to customers’ unique needs and identify areas of potential profit. New Open Source programs like Hadoop and Apache Spark are providing the accessibility and framework to make the most of the big data management.
Plus, with the ushering in of the new artificial intelligence and machine learning capabilities of the Fourth Industrial Revolution, we have more capacity than ever before to turn data into insights that can benefit society at large.
Now is the best time to study data science – especially as the data scientist role was named the most promising job of 2019.
WHAT IS BIG DATA?
Big data is just that - extremely large sets of information that may be analysed to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. It is information that grows exponentially over time and must be processed with a specific computer program. Typical sizes of information are now in the multiple zettabyte range – a zettabyte is a trillion gigabytes.
Divided into three main categories – structured, unstructured and semi-structured – data has clear differences and capabilities.
1. Structured Data
Structured data has been processed and formatted in a fixed arrangement that allows it to be retrieved easily. This kind of data is easy to work with and easy to extrapolate information from.
2. Unstructured data
This data lacks any kind of format or function and can also be called raw data. Many companies have this information and seldom have the resources to make it useful.
3. Semi-Structured data
This data has had some formatting done but still has room for more processing.
The characteristics of big data are often called the four Vs.
- Volume: the size of data plays a crucial role in determining its value.
- Variety refers to the nature of data, both structured and unstructured. Previously, data sources were limited to spreadsheets and databases. With the technology boom, new forms of data became available in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc.
- Velocity is the speed that the data is generated. How fast the data is generated and processed to meet the demands, determines real potential in the data. The flow of information needs to be massive and continuous.
- Variability exposes the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
HOW IS BIG DATA USED?
Data mining has become a controversial subject worldwide. The idea that ‘Big Brother’ is watching and possibly manipulating your moves is a very real concern. With the explosion of smartphones and internet access, our everyday lives have become deeply entrenched in effortlessly providing data. Rewards cards keep tabs on the kinds of shopping you do. Google maps can use your frequently-used routes to provide better traffic predictions.
So, for many companies, the focus is on what to look for in the data and then how to use that to the mutual benefit of the client and service provider.
Fortunately, the benefit in large amounts of raw data allows for a streamline in processes and growth. According to Forbes, there are ten major areas that big data technology can expect major expansion:
- Predictive analytics: software and/or hardware solutions that allow firms to discover, evaluate, optimise and analyse big data sources to improve business performance.
- NoSQL databases: key-value, document and graph databases.
- Search and knowledge discovery: tools and technologies to support self-service extraction of information and new insights from large repositories of unstructured and structured data.
- Stream analytics: software that can filter, aggregate, enrich, and analyse a high throughput of data from multiple disparate live data sources and in any data format.
- In-memory data fabric: provides low-latency access and processing of large quantities of data by distributing data.
- Distributed file stores: a computer network where data is stored on more than one node, often in a replicated fashion, for redundancy and performance.
- Data virtualisation: a technology that delivers information from various data sources, including big data sources such as Hadoop and distributed data stores in real-time and near-real time.
- Data integration: tools for data orchestration across solutions such as Amazon Elastic MapReduce (EMR), Apache Hive, Apache Pig, Apache Spark, MapReduce, Couchbase, Hadoop and MongoDB.
- Data preparation: software that eases the burden of sourcing, shaping, cleansing, and sharing diverse and messy data sets to accelerate data’s usefulness for analytics.
- Data quality: products that conduct data cleansing and enrichment on large, high-velocity data sets, using parallel operations on distributed data stores and databases.
WHY IS BIG DATA USED?
What is the purpose of big data?
Ultimately, big data comes down to pattern recognition. Training machines to sift through phenomenal amounts of data to spot patterns can result in actionable insights that can change people’s lives – and businesses’ bottom-lines. That’s the big deal of big data – it reveals stuff to us that can fundamentally change how we do things.
What are the benefits of big data?
Well, let’s think about a typical day for most of us. You wake up and you read Google News, which serves you up a selection of pertinent headlines from the titles you commonly gravitate to because Google ‘knows’ you and your browsing habits. Then you hop on Facebook and get the same streamlined newsfeed, tailored to your interests. Following that, you check Google Maps for the fastest way to work; information that’s based – you guessed it! – on historical traffic data plus real-time reports. You get to work, and you drill down into what most businesses are focused on: being as customer-centric as possible. How do you do that? Through customer insights harvested from big data. Other examples? Spotify suggesting playlists to you, or Takealot prompting nifty gadgets for you to buy on the sly online, during work hours.
On a more serious note, big data is revolutionising healthcare. Today, we can decode DNA strings in minutes and spot disease patterns to potentially curb epidemics. From a science perspective, there’s CERN as an example – a Large Hadron Collider that can generate and analyse 30 petabytes (a petabyte is 1000 terabytes) of data a year, all focused on its mission to study the tiniest particles in the universe.
Data is also changing how we plan so-called smart cities. Fueled by the Internet of Things (IoT) and millions of connected devices, data from cities is helping us better plan these spaces – which is a very good thing, given that the UN believes two-thirds of the global population will live in cities by 2050. There’s also the impact of climate change to consider. For example, data on the devastation caused by Cyclone Idai and Kenneth in Mozambique could show us how to build homes to withstand more frequent extreme weather events.
Why is big data the future?
Because it’s too entrenched in every facet of our lives not to be. We will, undoubtedly, get a lot more protective over our personal data – which is a different, but just as important conversation – as we get better at understanding privacy. But there’s no doubt that data will keep being a big part of everything we do. Now that we have these capabilities, they’re only going to improve.
What’s so important about big data?
Knowledge is power. The trick with big data is to siphon through the noise to find the ‘nuggets’. That’s where AI and machine learning are critical. There’s no way a human can get through the +2.5-quintillion bytes of data generated daily, but machines can. So, our job is to give machines the right initial instructions to find the insights we can use and act upon.
WHAT BIG DATA JOBS CAN I DO?
It just so happens that Harvard Business Review voted data scientists as having the Sexiest Job of the 21st Century. This was back in 2012, but the hype hasn’t died, and, at the time of writing this, most big data roles earn professionals around R500 000 pa or more. Plus, those in data science are seriously sought after in SA and around the world. Data scientists have the unique ability to marry maths, computer science and trends insights, with logic. They analyse data to interpret it in a way that adds real value to a business. This makes them invaluable to almost every industry.
In SA, we have a shortage of these skills, but we’re not alone. Globally, ITWeb suggests there’s a shortfall of about 5 to 10-million data scientists. LinkedIn saw a 56% increase in data scientist job openings in the US in the last year, and a report from job site Indeed found a 29% increase in demand for data scientists since 2018, and a 344% increase since 2013. Suffice to say, if you’re savvy with numbers and excellent at strategic insights, you can’t really go wrong with this kind of role right now.
So, what big data jobs should I consider?
A data engineer builds the infrastructure required to generate data. A data scientist, in contrast, then uses advanced maths and statistics to draw insights from that data. There’s definite overlap between the roles, but engineers are effectively ‘the builders’ in the relationship dynamic. The pay? About R350 583 pa according to Payscale.com.
We know this has been named the world’s sexiest job, but what exactly does it entail? In a nutshell, you’ll find ways to glean meaning from data, using tools and machine learning. This means lots of collating and cleaning of data, because it gets messy! The average salary for this role is R636 592, according to Indeed.co.za.
As a data architect, you’ll oversee the design, structure and maintenance of data, to ensure it’s accurate and accessible to an organisation or project. Organising data takes top skills – you need to be exceptional at SQL and XML and other programming languages. If you’re meticulous about details and extremely analytical, this could be the right role for you. Plus, you’ll earn R600 000 pa, according to Payscale.com.
Can you quickly sift through data to ‘get to the good stuff’? Do you have a knack for drawing smart insights others might not have thought of? Are you good at distilling what info you need and how to get it most efficiently? Maybe you should be a data analyst! While a data scientist designs new processes for data modelling and production, a data analyst uses information to spot trends to inform business strategies. Average pay is R244 491, according to Payscale.com.
Big data visualiser
When someone reads off a pattern of numbers, can you close your eyes and immediately see a picture of how these figures relate to each other? A data visualiser has the rare ability to take data and present it pictorially in a way the rest of us can understand. There’s not enough information for Payscale to give the average salary for this role, but Twitter’s advertising for a data visualisation scientist (at the time of writing) and willing to recompense the right candidate $145 000 to $176 000 pa.
As an analytics manager, you coordinate analytics responsibilities for your company, which includes spotting risks, analysing the customer experience and measuring performance. Effectively, you find analytics-driven solutions that’ll benefit the bottom-line. In return, you’ll earn around R630 255 pa according to Payscale.com.
Are there big data industry bodies that one can join?
- The Institute for Analytics: This is a not-for-profit professional body for analytics and data science professionals in the UK and overseas.
- The Data Management Association (DAMA): A global data management community set on being a world leader in data and information management practices.
- Institute for Information Technology Professionals (IITPSA): SA-based, this group promotes the ‘ICT professional’ and assists with policy formation and professionalism.
WHO ‘DOES’ BIG DATA?
Big data is often interpreted by data scientists who have traditionally been expected to excel at a vast arsenal of skills – like problem-solving, trend-spotting, decision-making, statistics and database analysis, data visualisation, coding, data preparation and machine-learning. It’s a lot, which is why Raconteur believes data scientists will make the move from generalists to specialists over the next few years. However, there are certain traits and skills most people working in big data will need.
Personality traits and technical skills needed for jobs in big data, according to CIO.com:
- Problem solving and critical thinking capabilities
- Strong coding skills, like R Programming, Python and Apache Hadoop proficiency
- AI and machine learning skills like supervised and unsupervised machine learning and natural language processing
- Strong maths skills
- Excellent communication
- Brilliant data visualisation abilities
- Ability to recognise and mitigate risks
- Desire to optimise systems and processes
- Good intuition
It’s worth noting many data scientists have qualifications in computer science, physical science, statistics or engineering. These degrees give a great foundation, but they aren’t essential to employers. If you’ve got the skills, more and more employers won’t care how you accrued them, especially in the data and software space. Importantly, you need to be a self-starter. This is a space that’s changing all the time so there’s no room for complacency. You must be committed to continuous learning.
As someone with skills in big data, you’ll probably be in-demand with myriad companies around the world; a quick search on LinkedIn at the time of writing reveals Amazon, Accenture, Uber, Huawei, Capitec, Prodigy Finance and McKinsey are hiring data professionals right now.
WHEN IS BIG DATA USED?
Everywhere, all the time. Its acceleration has been exponential and it’s now a ubiquitous part of our lives. Which means trends change fast and furiously.
What are the big trends in big data right now?
- Data management is critical, which makes data engineers invaluable. Different types of data require different storage solutions, which creates siloes of data to integrate, clean and label for machine learning. That’s where excellent engineers are a must.
- Real-time (streaming) data is on the rise, which is catalysing quick reactions from organisations. It’s all about the ‘ultra-fast processing of incoming data’ with machine learning models assisting with automated decision-making in response to real-time information. What it means for your business? Faster reaction times are critical to longevity and success.
- There’s more regulation protecting data privacy. In SA, there’s the Protection of Personal Information Act (POPI) for example. Anyone harvesting data will need to do so responsibly and transparently, with full disclosure to everyone involved.
- Smart data is the new big data. Cloud Computing News talks about the rise of smart data – the data that’s left after the ‘noise’ has been filtered out of big data. New, highly advanced data processing tools – like augmented analytics, which uses natural language processing and machine learning for enhanced data analytics and business intelligence – are providing ever-smarter and more accurate insights. The shift to smart seems to match the move to specialised – data professionals will require increasingly specialised skillsets, with the rise of smart data prompting highly specific roles. The good news? Those who keep learning will have ample opportunity to fill these gaps.
- We’re living in a time of smart everything. Continuing the smart theme, it seems only fitting to mention the ubiquity of connected devices and the Internet of Things (IoT). There are voice assistants like Alexa popping up all over the place, controlling every aspect of the home – and by extension, the city. BusinessTech suggested there will be 50-billion connected devices by 2020. With this proliferation comes more concerns around how data can be used responsibly to foster progress without harming privacy rights.
WHICH BOOKS CAN I READ TO FIND OUT MORE ABOUT BIG DATA?
There are infinite sources of information to fuel your big data fire. Here are some essential books that will give you a well-rounded level of knowledge.
This book will build a sturdy foundation for your big data learning. Some concepts might be difficult to understand for the beginner, but a committed student will find this book worth the read.
This collection of essays will allow you to see the human connection to the data story. You will see how impactful big data can be in changing lives for the better from young children to the elderly.
This New York Times Bestseller covers some of the best cases of big data and its profound impact on our lives. It introduces the reader to a world which is driven by numbers and data.
It’s clear most big-data-related roles are excellent career choices for those wishing to stay relevant. With the ubiquity of information and the ushering in of AI, automation and machine learning, there are myriad opportunities for those with a love of data and strategic thinking. Human-machine collaborations are going to be imperative going forwards and in the world of big data, they’re already the norm.