Getting to know H2O

There are many machine learning packages and AI-powered software available in the market. But, many are unaware of the working dynamics or the technology stack behind these AI-powered tools. In this blog post, I would like to introduce you to the H2O platform – an open software machine learning platform used by 8,500+ organizations and 75,000+ data scientists around the world. (formerly known as 0xdata) was founded in 2012 by Cliff Click and SriSatish Ambati. It’s a Silicon Valley-based startup that developed the famous H2O platform. is one of the world’s leading open source AI and deep learning platform.

So, what lead to the development of H2O? One of the major problems with R Programming was that it was unable to handle large datasets. The performance of the tool was sluggish when handling huge datasets. This is what H2O aimed to overcome.

H2O is an open source platform for data scientists and developers for big-data analysis. The software provides data structures and methods suitable for big data. It allowed users to understand data by studying the whole dataset rather than relying on a sample/subset of data. It is fast, scalable and easy to implement package at any level. They essentially provide a Graphical User Interface (GUI) driven platform for companies to do faster data computations.

It has an interface through popular programming languages like R, Python, Java, Scala, JSON, and APIs. It also operates on cloud computing environments like Amazon EC2, Microsoft Azure and Google Compute Engine.

H2O’s statistical algorithms include K-means clustering, cox proportional Hazards, generalized linear models, multi-layer feed-forward Neural Network, distributed random forests, gradient boosting machines, naive bayes, principal component analysis, stochastic gradient descent and generalized low-rank models. The list goes on and on.

So, if you are using R programming, you can easily get started with H2O. Install the “h2o” R package directly from CRAN. One must understand how H2O works. No computation is ever performed in R. The data is converted into an H2O instance and all computations are performed (in highly optimized Java code) in the H2O cluster. It is initiated by REST calls from R.

So, are you interested in learning more about H2O? Head on to the resource section. You will find guides and documentation that will help you get started with H2O. Below are some useful guides to start with:

Although the company’s main product is the H2O platform it also offers a number of tools around that platform. Sparkling Water combines the Apache Spark data processing engine with the H2O platform. Deep Water is an integration that helps connect with multiple open source deep learning libraries such as TensorFlow, MXNet, and Caffe. Driverless AI is a tool that helps non-technical employees with preparing data, calibrating parameters and determining the optimal algorithms for tackling specific business problems with machine learning.

So, get ready to deep swim in H2O.

The Ocean is full of opportunities!


Leave a Reply

Your email address will not be published. Required fields are marked *