How to manage big data and how hadoop will help us to do so?

When the data started moving towards its unstructured form to the structured one and people started moving towards many computers from single computer with heavy configuration (commodity hardware), the need arose for the technology to evolve.

Now firstly we shall understand about which unstructured data are we talking about? So the answer to the question is Big data. Let's understand a little about Big data first.
Big data refers to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.
After this concept, let's talk about the problems faced while using this data, there are various levels on which we work or we can say that work like storing, processing of data, use of commodity hardware, in short, data is distributed here n there so that it can be stored on commodity hardware for every application firstly and secondly we can say that failures and crashes were to be faced and handled as computer networks are not very reliable and last but not the least, third one is to write the heart of the program.

All the teams in companies like Google had above three requirements-

Data storage

Failures

Heart (core software application)

but later on the requirement got cleared in Google's mind that if all want to write the same things then why don't we share all the common things in a framework so that developers will only have to concentrate on the main thing, that is, the "heart" of the program(whatever they need and whatever problems they are facing to make that) and can leave rest of the work on the framework. Many such frameworks are now a days available but we will talk here about Hadoop.

Hadoop is an open source data processing software commonly used for storing and processing of large amounts of data running on clustered applications. This framework was created by Doug Cutting and Mike Caferella in 2005.

Some new terms also introduced at that time like "programming paradigm(because of the various machines being used parallel and data stored on them can be used by utilizing their full power), Map Reduce and Google File System. These terms can be understood as-

Programming paradigm-New way of writing the program in such a way that other things will work in parallel way.

Map Reduce- An advanced programming model capable of processing of huge data sets.

Google File System- GFS is an enhanced distributed file system created by Google for the purpose of providing storage and access to large amounts of data using clusters of commodity hardware.

REFERENCES:
What is big data?

iStudy Guru

Search This Blog

How to manage big data and how hadoop will help us to do so?

Labels

Comments

Post a Comment

Top 5 Most Read Posts

Who is Peter Lynch and what is his philosophy in equity market investment? 25 Golden Rules of the most successful Fund Manager.

What is version checking in Hibernate ?

What is wrong with HDFC securities? Are they doing some fraudulent activities or just causing issues with their platform as usually it don't work during market hours?

Fundamental Analysis : Asian Paints Ltd.

What are JEE Containers? What are their different types?