Abstract: Cloud computing has opened doors to a new era of enterprises that harness the new Cloud enabled business. More and more novel applications are leveraging this paradigm every day, which translates to a never seen increase in the amount of stored data. This phenomenon is commonly known as Big Data; the presence of rapidly expanding high-volume data sets. Many of these applications bring new challenges to databases and therefore the scalability of the Cloud-based databases has become a top-research issue of the Cloud Computing infrastructure. As an alternative to the well-known relational databases, NoSQL databases have born to fit Big Data application requirements. Traditional relational databases as they are often implemented are not sufficient anymore for Internet scale distributed systems dealing with Big Data. Nevertheless, NoSQL have proved to be robust in Big Data applications. The purpose of this project is to scaling out the data of a San Diego company that produces software for wireless multimedia, which is currently implemented on a MySQL cluster. In order to improve the performance of the computation, we propose a solution using Apache HBase, a NoSQL database. This final project proposes implementations as well as comparison details of a number of computation techniques conducted in HBase along with different open-source distributed computing components such as Hadoop HDFS and MapReduce, and presents benchmarks of our developed solution.
Free keyword(s): cloud computing ; nosql ; hbase ; hadoop ; mapreduce ; cap ; data skew Tipo de Trabajo Académico: Proyecto Fin de Carrera
Notas: PFC desarrollado en Aalto University (Helsinki, Finlandia). Resumen en español.