What is Hadoop and why should you care?
What is Hadoop and why should you care?
There are many compelling reasons to be interested in this project. I am personally just beginning my journey towards becoming a “DATA SCIENTIST”. I’m following my gut instincts that this is something I want to learn and master. It’s the same feeling I had when I decided to learn .NET development, or SharePoint, or HTML5, or T-SQL, my internal i.t. radar has locked onto Hadoop and I’m excited about it.
Hadoop is a project which was created in order to effectively deal with the very real and (relatively) very new problem of handling mass amounts of data. Modern popular websites all generate enormous amounts of data that needs to be processed. As computers have gotten more and more powerful other activities such as the scientific experiments also produce enormous amounts of data. Traditional methods were very expensive and at times cumbersome so some computer scientists came up with Hadoop.
Hadoop can process enormous amounts of data with high availability and redundancy on ‘commodity’ hardware. One of the most elegant parts of this solution is that it is engineered to run on ‘commodity’ hardware. In other words you can set it up on one to several thousands of individual servers that on their own might be employed with very lightweight tasks but together can move and process mountains of data. The system keeps the data highly available and redundant so hardware failures are not a problem, the data is replicated and the software keeps track of where everything is and just keeps on going.
Besides this it is just cool.
Another fairly recent I.T. development, though technically not related, has been the use of Shipping Containers converted into mobile data centers. My favorite versions of these are the HP ones. I want one for Christmas. They are the perfect solution for private cloud solutions because you can get an incredible amount of computing power dropped off to a location of your choice. They can even be shipped IT ready so you basically plug them in and go.
(Here is an amazing video about the whole HP Data Container thing)
Don’t get me wrong, the HP Pod Data Center Containers are amazing and run anything, not just Hadoop, and Hadoop doesn’t require HP Pod Data Center Containers to run, I am working with setting up my environment on a single instance of Ubuntu in a Microsoft Hyper-V Machine…but one has to admit that the combination is just too good to ignore!
One of the beauties of Hadoop is its scalability. You can start with one machine and grow quickly to thousands of machines and it just works. This is one of the amazing things about products like Microsoft SharePoint, and why those products are so successful. And speaking of Microsoft…as of this writing there are some very exciting things going on in the Microsoft Azure Space including its ability to run Hadoop (or just about anything else for that matter) so it is going to be everywhere. And that’s one of the reasons you want to know about it (if you are in I.T.) because you are highly likely to be dealing with it.
The approach that Hadoop takes to big data is one of breaking down this enormous unmanageable task into many small manageable tasks. This process is done via what is called Map Reduce and uses the simple Key – Value pairing that is easy for machines to process. The thing is, how you implement this is totally up to you, it’s just the overall mechanics that are managed by Hadoop. This is so Nano Technology Utility Fog that I can’t stop thinking about it.
The names alone of the technologies associated with Hadoop are so freaking cool (These are all listed on the Hadoop Apache website along with descriptions and technical notes:
These names are as cool as PowerShell which is probably the single toughest name for a technology to date.
In fact the hadoop.apache.org website has all kinds of amazing technical information and specifications (as you would expect.)
You can find a list of users of Hadoop and a brief description of their implementation here:
Hadoop is here. It has a great name. Its associated technologies have a great name. It’s being used to handle huge data by many companies, and it’s going to be supported in Azure. Any one of those reasons is good enough for me but all together it’s a very exciting technology to be aware of. I’m just getting started but I’m committed!! Look for more blogs in the future!
You May Also Like
In this video, you will gain an understanding of Agile and Scrum Master Certification terminologies and concepts to help you make better decisions in your Project Management capabilities. Whether you’re a developer looking to obtain an Agile or Scrum Master Certification, or you’re a Project Manager/Product Owner who is attempting to get your product or … Continue reading Agile Methodology in Project Management
In this SharePoint training video, I want to talk about the Navigation Controls in SharePoint. They tend to fall into two kind of different categories; one with the navigation controls in a typical Collaboration Site such as a Team Site or a Project Site. These are Sites that are based on the Team Site Template … Continue reading Using Navigation Controls in a Collaboration Site in SharePoint
How does an investigator hunt down and identify unknown malware? In this recording of our IT Security training webinar on April 21, 2015, Security expert Mike Danseglio (CISSP / CEH) performed several malware investigations on infected computers and identify symptoms, find root cause, and follow the leads to determine what’s happening. He demonstrated his preferred … Continue reading Detailed Forensic Investigation of Malware Infections – April 21, 2015