At the 9th Current Trends in Theoretical Chemistry conference we presented a lecture entitled: "Machine learning models for fast estimation of electronic properties of molecules".
 
Abstract
          AI based methods are increasingly popular in research due to their capabilities and speed. In chemistry, historically, QSAR related methods existed before the terms AI and Machine Learning (ML) were introduced.
          The crucial issue in ML in chemistry is to encode the chemical structure information in a numerical format suitable as input for modeling algorithms. A number of so-called molecular descriptors were designed for this purpose. Various molecular fingerprint based descriptors are currently a popular choice in modeling physical and biological properties of molecules due to the speed of computation and broad range of applications.
          In this case study the QM9 data set [1] was used to evaluate the performance of fingerprint based Random Forest Regressor models in estimating various B3LYP computed electronic properties of compounds with speed in the order of over 100 compounds per second on one CPU.
          Applicability of fingerprint-based descriptors to qualitative models, modeling algorithm choice and other model quality related issues will be discussed.  

[1] Nandi, S., Vegge, T. & Bhowmik, A. MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods. Sci Data 10, 783 (2023). https://doi.org/10.1038/s41597-023-02690-2.


The full presentation is available here.