Hyper Parameter Optimization

We support black box hyper parameter optimization in variant search space.

Search Space

Three types of search space are supported, use dict in python to define your search space. For numerical list search space. You can either assign a fixed length for the list, if so, you need not provide cutPara and cutFunc. Or you can let HPO cut the list to a certain length which is dependent on other parameters. You should provide those parameters’ names in curPara and the function to calculate the cut length in “cutFunc”.

# numerical search space:
{
    "parameterName": "xxx",
    "type": "DOUBLE" / "INTEGER",
    "minValue": xx,
    "maxValue": xx,
    "scalingType": "LINEAR" / "LOG"
}

# numerical list search space:
{
    "parameterName": "xxx",
    "type": "NUMERICAL_LIST",
    "numericalType": "DOUBLE" / "INTEGER",
    "length": 3,
    "cutPara": ("para_a", "para_b"),
    "cutFunc": lambda x: x[0] - 1,
    "minValue": [xx,xx,xx],
    "maxValue": [xx,xx,xx],
    "scalingType": "LINEAR" / "LOG"
}

# categorical search space:
{
    "parameterName": xxx,
    "type": "CATEGORICAL"
    "feasiblePoints": [a,b,c]
}

# fixed parameter as search space:
{
    "parameterName": xxx,
    "type": "FIXED",
    "value": xxx
}

How given HPO algorithms support search space is listed as follows:

Algorithm

numerical

numerical list

categorical

fixed

Grid

Random

Anneal

Bayes

TPE

CMAES

MOCMAES

Quasi random

AutoNE

Here, TPE is from [1], CMAES is from [2], MOCMAES is from [3], quasi random is from [4], AutoNE is from [5].

[1] Bergstra, James S., et al. “Algorithms for hyper-parameter optimization.” Advances in neural information processing systems. 2011. [2] Arnold, Dirk V., and Nikolaus Hansen. “Active covariance matrix adaptation for the (1+ 1)-CMA-ES.” Proceedings of the 12th annual conference on Genetic and evolutionary computation. 2010. [3] Voß, Thomas, Nikolaus Hansen, and Christian Igel. “Improved step size adaptation for the MO-CMA-ES.” Proceedings of the 12th annual conference on Genetic and evolutionary computation. 2010. [4] Bratley, Paul, Bennett L. Fox, and Harald Niederreiter. “Programs to generate Niederreiter’s low-discrepancy sequences.” ACM Transactions on Mathematical Software (TOMS) 20.4 (1994): 494-495. [5] Tu, Ke, et al. “Autone: Hyperparameter optimization for massive network embedding.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.

Add Your HPOptimizer

If you want to add your own HPOptimizer, the only thing you should do is finishing optimize function in you HPOptimizer:

# For example, create a random HPO by yourself
import random
from autogl.module.hpo.base import BaseHPOptimizer
class RandomOptimizer(BaseHPOptimizer):
    # Get essential parameters at initialization
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_evals = kwargs.get("max_evals", 2)

    # The most important thing you should do is completing optimization function
    def optimize(self, trainer, dataset, time_limit=None, memory_limit=None):
        # 1. Get the search space from trainer.
        space = trainer.hyper_parameter_space + trainer.model.hyper_parameter_space
        # optional: use self._encode_para (in BaseOptimizer) to pretreat the space
        # If you use _encode_para, the NUMERICAL_LIST will be spread to DOUBLE or INTEGER, LOG scaling type will be changed to LINEAR, feasible points in CATEGORICAL will be changed to discrete numbers.
        # You should also use _decode_para to transform the types of parameters back.
        current_space = self._encode_para(space)

        # 2. Define your function to get the performance.
        def fn(dset, para):
            current_trainer = trainer.duplicate_from_hyper_parameter(para)
            current_trainer.train(dset)
            loss, self.is_higher_better = current_trainer.get_valid_score(dset)
            # For convenience, we change the score which is higher better to negative, then we should only minimize the score.
            if self.is_higher_better:
                loss = -loss
            return current_trainer, loss

        # 3. Define the how to get HP suggestions, it should return a parameter dict. You can use history trials to give new suggestions
        def get_random(history_trials):
            hps = {}
            for para in current_space:
                # Because we use _encode_para function before, we should only deal with DOUBLE, INTEGER and DISCRETE
                if para["type"] == "DOUBLE" or para["type"] == "INTEGER":
                    hp = random.random() * (para["maxValue"] - para["minValue"]) + para["minValue"]
                    if para["type"] == "INTEGER":
                        hp = round(hp)
                    hps[para["parameterName"]] = hp
                elif para["type"] == "DISCRETE":
                    feasible_points = para["feasiblePoints"].split(",")
                    hps[para["parameterName"]] = random.choice(feasible_points)
            return hps

        # 4. Run your algorithm. For each turn, get a set of parameters according to history information and evaluate it.
        best_trainer, best_para, best_perf = None, None, None
        self.trials = []
        for i in range(self.max_evals):
            # in this example, we don't need history trails. Since we pass None to history_trails
            new_hp = get_random(None)
            # optional: if you use _encode_para, use _decode_para as well. para_for_trainer undos all transformation in _encode_para, and turns double parameter to interger if needed. para_for_hpo only turns double parameter to interger.
            para_for_trainer, para_for_hpo = self._decode_para(new_hp)
            current_trainer, perf = fn(dataset, para_for_trainer)
            self.trials.append((para_for_hpo, perf))
            if not best_perf or perf < best_perf:
                best_perf = perf
                best_trainer = current_trainer
                best_para = para_for_trainer

        # 5. Return the best trainer and parameter.
        return best_trainer, best_para