08:00 | Registration | |
09:00 | Welcome | Joeran Beel (Trinity College Dublin) and Lars Kotthoff (University of Wyoming) |
09:10 | Keynote (Slides) | Marius Lindauer (University of Freiburg) |
Automated Algorithm Selection: Predict which algorithm to use!
Abstract: To achieve state-of-the-art performance, it is often crucial to select a suitable algorithm for a given problem instance. For example, what is the best search algorithm for a given instance of a search problem; or what is the best machine learning algorithm for a given dataset? By trying out many different algorithms on many problem instances, developers learn an intuitive mapping from some characteristics of a given problem instance (e.g., the number of features of a dataset) to a well-performing algorithm (e.g., random forest). The goal of automated algorithm selection is to learn from data, how to automatically select a well-performing algorithm given such characteristics. In this talk, I will give an overview of the key ideas behind algorithm selection and different approaches addressing this problem by using machine learning. Biography: Marius Lindauer is a junior research group lead at the University of Freiburg (Germany). His goal is to make the technology behind state-of-the-art research on artificial intelligence (AI) available to everyone. To this end, his research and tools aim at automating the development process of new AI systems. He received his M.Sc. and Ph.D. in computer science at the University of Potsdam (Germany), where he worked in the Potassco group. In 2014, he moved to Freiburg i.Br. (Germany) as a postdoctoral research fellow in the AutoML.org group. In 2013, he was one of the co-founders of the international research network COSEAL (COnfiguration and SElection of ALgorithms) and is nowadays a member of its advisory board. Besides organizing the first open algorithm selection challenge and winning several international AI competitions, he was a member of the team that won the first and second edition of the international challenge on automated machine learning. |
||
10:10 | Algorithm selection with librec-auto | Masoud Mansoury and Robin Burke |
Due to the complexity of recommendation algorithms, experimentation on recommender systems has become a challenging task. Current recommendation algorithms, while powerful, involve large numbers of hyperparameters. Tuning hyperparameters for finding the best recommendation outcome often requires execution of large numbers of algorithmic experiments particularly when multiples evaluation metrics are considered. Existing recommender systems platforms fail to provide a basis for systematic experimentation of this type. In this paper, we describe librec-auto, a wrapper for the well-known LibRec library, which provides an environment that supports automated experimentation. | ||
10:20 | ||
10:30 | Coffee | |
11:00 | Investigating Ad-Hoc Retrieval Method Selection with Features Inspired by IR Axioms | Siddhant Arora and Andrew Yates |
We consider the algorithm selection problem in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods’ relevance scores into an overall relevance score. These predictions are based on features inspired by IR axioms that quantify properties of the query and its top rank documents. We conduct an evaluation on TREC benchmark data and find that the meta-learner often significantly improves over the individual methods in terms of both nDCG@20 and P@30. Finally, we conduct a feature weight analysis to investigate which features the meta-learner uses to make its decisions. | ||
11:30 | Augmenting the DonorsChoose.org Corpus for Meta-Learning | Gordian Edenhofer, Andrew Collins, Akiko Aizawa, and Joeran Beel |
The DonorsChoose.org dataset of past donations provides a big and feature-rich corpus of users and items. The dataset matches donors to projects in which they might be interested in and hence is intrinsically about recommendations. Due to the availability of detailed item-, user- and transaction-features, this corpus represents a suitable candidate for meta-learning approaches to be tested. This study aims at providing an augmented corpus for further recommender systems studies to test and evaluate meta-learning approaches. In the augmentation, metadata of collaborative and content-based filtering techniques is amended to the corpus. It is further extended with aggregated statistics of users and transactions and an exemplary meta-learning experiment. The performance in the learning subsystem is measured via the recall of recommended items in a Top-N test set. The augmented dataset and the source code are released into the public domain at GitHub:BeelGroup/Augmented-DonorsChoose.org-Dataset. | ||
12:00 | RARD II: The 94 Million Related-Article Recommendation Dataset | Joeran Beel, Barry Smyth and Andrew Collins |
The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from a recommender-system in the digital library and reference management software domain. As such, it complements datasets from other domains such as books, movies, and music. The RARD II dataset encompasses 94m recommendations, delivered in the two years from September 2016 to September 2018. The dataset covers an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual ratings matrices, RARD II includes the original recommendation logs, which provide a unique insight into many aspects of the algorithms that generated the recommendations. The recommendation logs enable researchers to conduct various analyses about a real-world recommender system. This includes the evaluation of meta-learning approaches for predicting algorithm performance. In this paper, we summarise the key features of this dataset release, describe how it was generated and discuss some of its unique features. Compared to its predecessor RARD, RARD II contains 64% more recommendations, 187% more features (algorithms, parameters, and statistics), 50% more clicks, 140% more documents, and one additional service partner (JabRef). | ||
12:30 | Lunch | |
13:30 | Hands-on Session with ASlib | Lars Kotthoff |
ASlib is a standard format for representing algorithm selection systems and a bechmark library with example problems from many different application domains. I will give an overview of what it is, example analyses available on its website, and the algorithm selection competitions 2015 and 2017 that were based on it. ASlib is available at http://www.aslib.net./ | ||
14:00 | Hands-On Automated Machine Learning Tools: Auto-Sklearn and Auto-PyTorch (Slides) | Marius Lindauer |
To achieve state-of-the-art performance in machine learning (ML), it is very important to choose the right algorithm and its hyperparameters for a given dataset. Since finding the correct settings needs a lot of time and expert knowledge, we developed AutoML tools that can be used out-of-the-box with minimal expertise in machine learning. In this session, I will present two state-of-the-art tools in this field: (i) auto-sklearn (www.automl.org/auto-sklearn/) for classical machine learning and (ii) AutoPyTorch (www.automl.org/autopytorch/) for deep learning. | ||
14:30 | Poster Session | |
A 1-hour poster session in which all previous speakers present their work as a poster. The poster session gives the attendees the oportunity to discuss the work of the presenters in more depth. The poster session will start at 14:30 and continue throughout the coffee break. | ||
15:00 | Coffee & Poster Session Cont’d | |
15:30 | Open Discussion | |
16:30 | Closing Remarks | Joeran Beel (Trinity College Dublin) and Lars Kotthoff (University of Wyoming) |
16:45 | End |