The Future Looming Large

DQI Bureau

14 Nov 2012 07:27 IST

New Update

"Traditional data generation is happening at a record rate. In 2009-10, the world generated over 1 zetabytes of data, by 2014 it may go up to 8 zetabytes a year. Regular increment of this data is the result of a drastic increase in devices located at the edge of the network including satellite, 4G phones, super computers, etc. All of this data creates great opportunities to ‘extract more value-add' in human lifestyle and any industry or sector."
What is Big Data?
From data warehouse to Business Intelligence (BI) now we are thinking one more level above because we are experiencing unexpected growth in structured and unstructured data (ie, various documents like word, excel, power point, images, videos, or PDF, HTML document, various Database schemas, telecom data, satellite data, etc) is very huge.
After seeing all this, a thought comes to our mind is how Amazon, Wal-mart, Google, Facebook, Yahoo!, YouTube, and other big players are managing such massive information and day-to-day transactions that too with a mindset to deliver information quickly.
All this is possible because of big data, although the term big data is relatively new but principally big data exceeds the processing capacity of conventional database systems and whether data is too big, moves too fast, or doesn't fit in the present structure of your database architectures. The most popular choice for a big data software stack is Hadoop.
Big Data has 3 main characteristics: Volume (amount of data), Velocity (speed of data in and out), Variety (range of data types and sources) :
Volume-Volume describes the amount of data generated by organizations or individuals. Big data is usually associated with this characteristic. Enterprises of all industries will need to find ways to handle the ever-increasing data volume that's being created every day.
Velocity-Velocity describes the frequency at which data is generated, captured and shared. Recent developments mean that not only consumers but also businesses generate more data in much shorter cycles. Because of the speed enterprises can only capitalize on this data if the data is captured and shared in real-time.
Variety-Big data means much more than rows and columns. It means unstructured text, video, audio that can have important impacts on company decisions-if it's analyzed properly in time.
Here are few examples of big data to get the idea:
Twitter produces over 90 mn tweets per day
Wal-Mart is logging one mn transactions per hour
Facebook creates over 30 bn pieces of content ranging from web links, news, blogs, photos etc.
The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates.
Why Big Data?
Big data allows corporate and research organizations to do things not previously possible economically.
Analysis
Business Trends
Prevent Diseases
Combat Crime, etc
Centralization of the Data
Potential of Big Data
The use of big data offers tremendous untapped potential for creating value. Organizations in many industry sectors and business functions can leverage big data to improve their allocation and coordination of human and physical resources, cut waste, increase transparency and accountability and facilitate the discovery of new ideas and insights.
Sectors with greatest potential for big data:
Healthcare
Public Sector
Retail
Manufacturing
Telecommunications
What is Hadoop?
Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of computational independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.
Hadoop is a top-level Apache project being built and used by a global community of contributors, written in the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.
Apache Hadoop has two main sub-projects:
1# MapReduce-Map/Reduce
is a term commonly thrown about these days, in essence, it is just a way to take a big task and divide it into discrete tasks that can be done in parallel.
2# HDFS-A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes.
Big Data is not only about Hadoop
When we talk about big data, most of the times we refer to the Hadoop framework. However, there are other alternative software also available ie, LexusNexus, HPCC Systems, the MarkLogic Server, and Splunk search engine.
Is Big Data Expensive?
Big data can be achieved with affordable IT cost. Most large corporates have been running enterprise applications, multiple databases, e-commerce portals, ERP, data warehouse and customer relationship management (CRM) making it easy to manage and aggregate critical data of those applications.
Unfortunately, there a few challenges in Hadoop:
Real-time analytics and data processing is a challenge.
Hadoop is a batch-oriented system that is not friendly to real-time processing.
Hadoop development requires advanced expertise. Map/Reduce is not an easy programming.