Introduction to Big Data Analysis

13 PAGES (3481 WORDS) Cyber Security Project

INTRODUCTION

Big data is a word used for detailed information of massive amounts of data which are either structured, semi structured or unstructured. The data which is not able to be handled by the traditional databases and software Technologies then we divide such data as big data. The term big data is originated from the web companies who used to handle loosely structured (numerical form, figures, and transaction data etc.) or unstructured data (Email attachments, Images comments on social networking sites).

It gives a broad overview of some of the most commonly used techniques and technologies to help the reader to better understand the tools based on big data analytics. There are many analytic techniques that could be employed when considering a big data project. Which ones are used that depends on the type of data being analyzed, the technology available to you, and the research questions you are trying to solve.

The term has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term  Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capturecurate, manage, and process data within a tolerable elapsed time. Big data philosophy encompasses unstructured, semi-structured and structured data, however the main focus is on unstructured data. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen Terabytes to many Exabytes of data. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.

Similarly, Kaplan and Haenlein define big data as "data sets characterized by huge amounts (volume) of frequently updated data (velocity) in various formats, such as numeric, textual, or images/videos (variety). Additionally, a new V, veracity, is added by some organizations to describe it, revisionism challenged by some industry authorities. The three Vs (volume, variety and velocity) have been further expanded to other complementary characteristics of big data.

The term “big data” wasn’t coined until 2010 approximately when they realized the power, need and importance of this information. Given the scope of information, the term “big data” come into the picture.

Big data and Cloud computing both the technologies are valuable on its own. Furthermore, many businesses are targeting to combine the two techniques to reap more business benefits. Both the technologies aim to enhance the revenue of the company while reducing the investment cost. While Cloud manages the local software, Big data helps in business decisions.