A Quick Intro to HDFS

After a busy and eventful week leading up to Easter Sunday, I’m back to talking about tech, security, and all things related.

Today, at work, a client was having issues with HDFS service running in Cloudera, in particular with namenodes. Again, part of the difficulty of supporting Hadoop systems is the relatively large number of services you have to know – at least on the surface level. HDFS, which stands for Hadoop Distributed File System – originally inspired by the GoogleFileSystem –  is the part of the overarching Hadoop architecture that is designed to store very large files across nodes within a cluster and achieve redundancy of those same large files. In the whole Big Data scheme of things, HDFS is what makes very large storage requirements possible, as one could theoretically take commodity hardware, each with large amounts of disk storage, and make each piece of hardware a node in the cluster.

To get further acquainted with HDFS – as I’ll need to do myself – here are some links to get started:

https://wiki.apache.org/hadoop/HDFS/
https://www-01.ibm.com/software/data/infosphere/hadoop/hdfs/
http://www.tutorialspoint.com/hadoop/hadoop_hdfs_overview.htm
https://hortonworks.com/apache/hdfs/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s