New - Hadoop Cluster Hosting
Milwaukee, Wisconsin - June 30, 2009 - Smedia now offers Hadoop Cluster Hosting! web hosting provider and Internet services company, recently made available its Distribution of Hadoop, based on code available from the Apache Hadoop project, while hosting the Second Annual Hadoop Summit.
Hadoop is a Java-based distributed file system and parallel execution environment that enables its users to process massive amounts of data. Hadoop primarily uses the HDFS file system to store large files over many machines, allowing for data integrity by replicating data over multiple hosts. DataNodes contain the data specific to HDFS and are coordinated by a NameNode. A JobTracker runs map/reduce jobs to individual TaskTrackers. Hadoop's "rack-awareness" allows HDFS to understand the promixity of servers in the cluster to reduce traffic on geographically-seperated servers.
Hadoop can also be run as a FTP filesystem as well as a read-only HTTP(S) file system.
Justin Erenkrantz, president of Apache Software Foundation noted, ''It's exciting to see how the Apache Hadoop project has progressed in the last few years. We're looking forward to seeing community growth accelerate with Yahoo!'s continued support and focus on quality, ultimately driving more contributors to the Apache Hadoop project.''
Currently at Yahoo!, Hadoop runs on more than 25,000 servers and analyzes tens of billions of Web pages, multiple petabytes of storage and billions of new records per day.
|