Technique of Augmenting Molecular Graph Data by Perturbating Hidden Features

Abstract

Quantitative structure–property relationship models are useful in efficiently searching for molecules with desired properties in drug discovery and materials development. In recent years, many such models based on graph neural networks, showing good prediction performance, have been reported. Training graph neural networks generally require many samples, but by using a training method for a small dataset, it is possible to extract features that enable successful prediction. Herein, we design a method of augmenting graph data. In this method, random perturbations are added with a certain probability to some vertex features during message passing. We verify the proposed method’s effectiveness in regression and classification tasks. It is confirmed that the proposed method is effective when the perturbation is added immediately before the readout of the graph neural network, and the effect of the data augmentation is most evident for small datasets of approximately 1000 samples.

Publication
Molecular Informatics

Related