What is Hadoop and why should you care?
What is Hadoop and why should you care?
There are many compelling reasons to be interested in this project. I am personally just beginning my journey towards becoming a “DATA SCIENTIST”. I’m following my gut instincts that this is something I want to learn and master. It’s the same feeling I had when I decided to learn .NET development, or SharePoint, or HTML5, or T-SQL, my internal i.t. radar has locked onto Hadoop and I’m excited about it.
Hadoop is a project which was created in order to effectively deal with the very real and (relatively) very new problem of handling mass amounts of data. Modern popular websites all generate enormous amounts of data that needs to be processed. As computers have gotten more and more powerful other activities such as the scientific experiments also produce enormous amounts of data. Traditional methods were very expensive and at times cumbersome so some computer scientists came up with Hadoop.
Hadoop can process enormous amounts of data with high availability and redundancy on ‘commodity’ hardware. One of the most elegant parts of this solution is that it is engineered to run on ‘commodity’ hardware. In other words you can set it up on one to several thousands of individual servers that on their own might be employed with very lightweight tasks but together can move and process mountains of data. The system keeps the data highly available and redundant so hardware failures are not a problem, the data is replicated and the software keeps track of where everything is and just keeps on going.
Besides this it is just cool.
Another fairly recent I.T. development, though technically not related, has been the use of Shipping Containers converted into mobile data centers. My favorite versions of these are the HP ones. I want one for Christmas. They are the perfect solution for private cloud solutions because you can get an incredible amount of computing power dropped off to a location of your choice. They can even be shipped IT ready so you basically plug them in and go.
(Here is an amazing video about the whole HP Data Container thing)
Don’t get me wrong, the HP Pod Data Center Containers are amazing and run anything, not just Hadoop, and Hadoop doesn’t require HP Pod Data Center Containers to run, I am working with setting up my environment on a single instance of Ubuntu in a Microsoft Hyper-V Machine…but one has to admit that the combination is just too good to ignore!
One of the beauties of Hadoop is its scalability. You can start with one machine and grow quickly to thousands of machines and it just works. This is one of the amazing things about products like Microsoft SharePoint, and why those products are so successful. And speaking of Microsoft…as of this writing there are some very exciting things going on in the Microsoft Azure Space including its ability to run Hadoop (or just about anything else for that matter) so it is going to be everywhere. And that’s one of the reasons you want to know about it (if you are in I.T.) because you are highly likely to be dealing with it.
The approach that Hadoop takes to big data is one of breaking down this enormous unmanageable task into many small manageable tasks. This process is done via what is called Map Reduce and uses the simple Key – Value pairing that is easy for machines to process. The thing is, how you implement this is totally up to you, it’s just the overall mechanics that are managed by Hadoop. This is so Nano Technology Utility Fog that I can’t stop thinking about it.
The names alone of the technologies associated with Hadoop are so freaking cool (These are all listed on the Hadoop Apache website along with descriptions and technical notes:
These names are as cool as PowerShell which is probably the single toughest name for a technology to date.
In fact the hadoop.apache.org website has all kinds of amazing technical information and specifications (as you would expect.)
You can find a list of users of Hadoop and a brief description of their implementation here:
Hadoop is here. It has a great name. Its associated technologies have a great name. It’s being used to handle huge data by many companies, and it’s going to be supported in Azure. Any one of those reasons is good enough for me but all together it’s a very exciting technology to be aware of. I’m just getting started but I’m committed!! Look for more blogs in the future!
Enjoy!
Spike Xavier
SharePoint Instructor – Interface Technical Training
Phoenix, AZ
You May Also Like
Apache, Azure, data, Hadoop, Map Reduce, Microsoft Azure Space
A Simple Introduction to Cisco CML2
0 3804 0Mark Jacob, Cisco Instructor, presents an introduction to Cisco Modeling Labs 2.0 or CML2.0, an upgrade to Cisco’s VIRL Personal Edition. Mark demonstrates Terminal Emulator access to console, as well as console access from within the CML2.0 product. Hello, I’m Mark Jacob, a Cisco Instructor and Network Instructor at Interface Technical Training. I’ve been using … Continue reading A Simple Introduction to Cisco CML2
Configuring Windows Mobility Center and How to Turn it On and Off
1 1434 1Video transcription Steve Fullmer: In our Windows training courses, we often share information about the Windows 8.1 Mobility Center. Mobility Center was introduced for mobile and laptop devices in Windows 7. It’s present and somewhat enhanced in Windows 8. Since we don’t have mobile devices in our classrooms, I decided to take a little bit … Continue reading Configuring Windows Mobility Center and How to Turn it On and Off
OSPF Adjacency Troubleshooting Solution – Getting Close to the OSPF adj
0 249 1In this video, Cisco CCNA & CCNP instructor Mark Jacob shows how to troubleshoot OSPF Adjacency issues by showing the distance between routers with the show ip ospf neighbor command.