资源说明:The amount of data in our industry and the world is
exploding. Data is being collected and stored at unprecedented
rates. The challenge is not only to store and manage the vast
volume of data (“big data”), but also to analyze and extract
meaningful value from it. There are several approaches to
collecting, storing, processing, and analyzing big data. The main
focus of the paper is on unstructured data analysis. Unstructured
data refers to information that either does not have a pre-defined
data model or does not fit well into relational tables. Unstructured
data is the fastest growing type of data, some example could be
imagery, sensors, telemetry, video, documents, log files, and email
data files. There are several techniques to address this
problem space of unstructured analytics. The techniques share a
common character tics of scale-out, elasticity and high availability.
MapReduce, in conjunction with the Hadoop Distributed File
System (HDFS) and HBase database, as part of the Apache
Hadoop project is a modern approach to analyze unstructured data.
Hadoop clusters are an effective means of processing massive
volumes of data, and can be improved with the right architectural
approach.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。
English
