• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Varun Kruthiventi

Thoughts, actions, code ...

  • Home
  • About
  • Python
  • Microservices
  • Publications
  • Recommended Books

Exploring H2O.ai AutoML

May 15, 2018 by Varun Kruthiventi Leave a Comment

In the short time I have spent on Kaggle, I have realized ensembling (stacking models) is the best way to perform well.

Well, I am not the only one to think so !!

Stacking is a Model Ensembling technique that combines predictions from multiple models and generates a new model.

I am gonna write a new post on model ensembling 🙂

I have experimented with multiple ensembling techniques and made a model with XGboost, LightGBM, and Keras for Zillow Zestimate problem which did perform well.

Hyper-Parameter tuning for the base models was done using Cross-Validation + Grid Search. Tuning the parameters of the combined model is where things get strenuous.

There, I began to search for a better way to build ensembled models. I found a few frameworks to build better-ensembled models like Auto-sklearn, TPOT, Auto-Weka, machineJS, and H2O.ai AutoML.

Auto-sklearn and TPOT provide a Sklearn styled API that can help you get things going quite fast. But H2O.ai Auto ML got better results for me at least 🙂

H2O.ai is an open-source Machine Learning platform that gives you a good bunch of Machine Learning algorithms to build scalable prediction models.

H20 AutoML can help in automating the machine learning workflow, which includes training and tuning of hyper-parameters of models. The AutoML process can be controlled by specifying a time limit or defining a performance metric-based stopping criteria. AutoML returns a leaderboard with the best models ensembled.

AutoML provides APIs in Python and R that come with H2O library.

I have decided to give it a try on H20 AutoML for the Zillow Zestimate problem. I have used R for making the model for making the submission.

Running the AutoML model for 1800 seconds with a stopping metric as MAE gave me a Public Leaderboard score of 0.06564.

That’s a good score considering that I haven’t even dealt with basic data preprocessing 🙂

Related

Filed Under: AutoML, Blog, Kaggle Tagged With: AutoML, Kaggle, ML, R

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Footer

Recent Posts

  • 11 things to do after setting up a WordPress site
  • Rewind: South Trip 2019
  • python-alpine and Postgres issues
  • Fixing Xcode path
  • Setting up Ambassador API gateway on Kubernetes

Tags

Ambassador APIs AutoML Computer Vision Conference Paper Development Tools Docker Holidays iOS Kaggle Kubernetes LSTM Mac Microservices ML Neural Network Plugins Python R Security Temples Time Series Travel WordPress WordPress Setup Xcode

Archives

  • November 2021
  • October 2021
  • May 2021
  • September 2019
  • January 2019
  • May 2018

Copyright © 2022 · Varun Kruthiventi