This tutorial will help you quickly go through the concepts and usages of important classes in AutoGL. In this tutorial, you will conduct a quick auto graph learning on dataset Cora.
Based on the concept of autoML, auto graph learning aims at automatically solve tasks with data represented by graphs. Unlike conventional learning frameworks, auto graph learning, like autoML, does not need humans inside the experiment loop. You only need to provide the datasets and tasks to the AutoGL solver. This framework will automatically find suitable methods and hyperparameters for you.
The diagram below describes the workflow of AutoGL framework.
To reach the aim of autoML, our proposed auto graph learning framework is organized as follows. We have
dataset to maintain the graph datasets given by users. A
solver object needs to be built for specifying the target tasks. Inside
solver, there are four submodules to help complete the auto graph tasks, namely
auto feature engineer,
hyperparameter optimization and
auto ensemble, which will automatically preprocess/enhance your data, choose and optimize deep models and ensemble them in the best way for you.
Let’s say you want to conduct an auto graph learning on dataset
Cora. First, you can easily get the
Cora dataset using the
from autogl.datasets import build_dataset_from_name cora_dataset = build_dataset_from_name('cora')
After deriving the dataset, you can build a
node classification solver to handle auto training process:
import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') from autogl.solver import AutoNodeClassifier solver = AutoNodeClassifier( feature_module='deepgl', graph_models=['gcn', 'gat'], hpo_module='anneal', ensemble_module='voting', device=device )
In this way, we build a
node classification solver, which will use
deepgl as its feature engineer, and use
anneal hyperparameter optimizer to optimize the given three models
['gcn','gat']. The derived models will then be ensembled using
voting ensembler. Please refer to the corresponding tutorials or documentation to see the definition and usages of available submodules.
Then, you can fit the solver and then check the leaderboard:
solver.fit(cora_dataset, time_limit=3600) solver.get_leaderboard().show()
time_limit is set to 3600 so that the whole auto graph process will not exceed 1 hour.
solver.show() will present the models maintained by
solver, with their performances on the validation dataset.
Then, you can make the predictions and evaluate the results using the evaluation functions provided:
from autogl.module.train import Acc predicted = solver.predict_proba() print('Test accuracy: ', Acc.evaluate(predicted, cora_dataset.data.y[cora_dataset.data.test_mask].cpu().numpy()))
You don’t need to pass the
cora_dataset again when predicting, since the dataset is remembered by the
solver and will be reused when no dataset is passed at predicting. However, you can also pass a new dataset when predicting, and the new dataset will be used instead of the remembered one. Please refer to AutoGL Solver or solver for more details.