AutoGL Feature Engineering

We provide a series of node and subgraph feature engineers for you to compose within a feature engineering pipeline. An automatic feature engineering algorithm is also provided.

Quick Start

# 1. Choose a dataset.
from autogl.datasets import build_dataset_from_name
data = build_dataset_from_name('cora')

# 2. Compose a feature engineering pipeline
from autogl.module.feature import BaseFeatureAtom,AutoFeatureEngineer
from autogl.module.feature.generators import GeEigen
from autogl.module.feature.selectors import SeGBDT
from autogl.module.feature.subgraph import SgNetLSD
# you may compose feature engineering atoms through BaseFeatureAtom.compose
fe = BaseFeatureAtom.compose([
GeEigen(size=32) ,
SeGBDT(fixlen=100),
SgNetLSD()
])
# or just through '&' operator
fe = fe & AutoFeatureEngineer(fixlen=200,max_epoch=3)

# 3. Fit and transform the data
fe.fit(data)
data1=fe.transform(data,inplace=False)

List of FE atom names

Now three kinds of feature engineering atoms are supported,namely generators, selectors , subgraph.You can import atoms from according module as is mentioned in the Quick Start part. Or you may want to just list names of atoms in configurations or as arguments of the autogl solver.

  1. generators

Atom

Description

graphlet

concatenate local graphlet numbers as features.

eigen

concatenate Eigen features.

pagerank

concatenate Pagerank scores.

PYGLocalDegreeProfile

concatenate Local Degree Profile features.

PYGNormalizeFeatures

Normalize all node features

PYGOneHotDegree

concatenate degree one-hot encoding.

onehot

concatenate node id one-hot encoding.

  1. selectors

Atom

Description

SeFilterConstant

delete all constant and one-hot encoding node features.

gbdt

select top-k important node features ranked by Gradient Descent Decision Tree.

  1. subgraph

netlsd is a subgraph feature generation method. please refer to the according document.

A set of subgraph feature extractors implemented in NetworkX are wrapped, please refer to NetworkX for details. (NxLargeCliqueSize, NxAverageClusteringApproximate, NxDegreeAssortativityCoefficient, NxDegreePearsonCorrelationCoefficient, NxHasBridge ,``NxGraphCliqueNumber``, NxGraphNumberOfCliques, NxTransitivity, NxAverageClustering, NxIsConnected, NxNumberConnectedComponents, NxIsDistanceRegular, NxLocalEfficiency, NxGlobalEfficiency, NxIsEulerian)

The taxonomy of atom types is based on the way of transforming features. generators concatenate the original features with ones newly generated or just overwrite the original ones. Instead of generating new features , selectors try to select useful features and keep learned selecting methods in the atom itself. The former two types of atoms can be exploited in node or edge level (modification upon each node or edge feature) ,while subgraph focuses on feature engineering in graph level (modification upon each graph feature). For your convenience in further development,you may want to design a new item by inheriting one of them. Of course, you can directly inherit the BaseFeatureAtom as well.

Create Your Own FE

You can create your own feature engineering object by simply inheriting one of feature engineering atom types ,namely generators, selectors , subgraph, and overloading methods _fit and _transform.

# for example : create a node one-hot feature.
from autogl.module.feature.generators.base import BaseGenerator
import numpy as np
class GeOnehot(BaseGenerator):
    def __init__(self):
        super(GeOnehot,self).__init__(data_t='np',multigraph=True,subgraph=False)
        # data type in mid is 'numpy',
        # and it can be used for multigraph,
        # but not suitable for subgraph feature extraction.

    def _fit(self):
        pass # nothing to train or memorize

    def _transform(self, data):
        fe=np.eye(data.x.shape[0])
        data.x=np.concatenate([data.x,fe],axis=1)
        return data