MatSQ Blog > Interview

SGCNN, a machine learning algorithm for predicting the adsorption energies of materials

Viewed : 287 times,  2021-10-21 05:20:12

The slab graph convolutional neural network (SGCNN) is expected to be installed in Materials Square (MatSQ) as early as before the end of October. SGCNN is a machine learning algorithm created by Dr. Sang Soo Han and Dr. Donghun Kim of the Korea Institute of Science and Technology (KIST)’s Computational Science Research Center. This algorithm can mainly be used to predict adsorption energy (binding energy) between a catalytic surface and an adsorbate. We interviewed Dr. Donghun Kim about the reasons for the development of the SGCNN and the decision to transfer the technology to the MatSQ platform.


The SGCNN model has the advantage of using information obtainable directly from the periodic table as input.


Q: Hello, Doctor Kim. Would you mind introducing yourself, please?

Hello, I am Donghun Kim, a senior researcher at the KIST Computational Science Research Center. Until my doctorate, my research had focused on improving the performance of a quantum dot solar cell utilizing density-functional theory (DFT) simulations. At present, I am working on the development of new materials by combining big data or artificial intelligence (AI) with simulation methodologies.


Q: It seems that the application of AI is very active in the field of catalytic materials.

That’s correct. As you know, models using AI were originally developed in the field of computer science. AI was first used in the social sciences, and biotechnology was a pioneer in its use in natural sciences. It was not until 2015 that AI applied to materials science. Compared with other disciplines, its application began somewhat late, but in the case of catalytic materials, it is being studied more quickly and actively than in other fields of applied materials.

Catalysts have a wide range of candidate space for materials. It is inefficient to make experiments one by one or take approaches through quantum mechanics–based calculations. It takes a long time to screen catalytic materials with the existing computational methods such as DFT. In my opinion, this is why more active studies of AI and machine learning have been conducted in catalyst research within materials science.

Predicting adsorption energy is essential in catalyst research. Being able to calculate adsorption energy quickly sounds quite compelling to the researchers, as many have noticed the possibility of rapidly screening a group of candidates for catalytic materials with AI models.


Q: What AI programs have been utilized in catalyst research until now?

Efforts have been made to develop AI models for catalyst research since 2015. For example, Prof. Hongliang Xin’s team at Virginia Tech (VT) in the United States was the first that developed a machine learning model for predicting the properties of catalysts, as far as I know. After that, similar types of machine learning models were developed, but they had limitations of practical use. In other words, all of these models required the density of states (DOS) as the input value of a machine learning program.

We need to note that the quantitative data of DOS can only be obtained by quantum mechanics–based computational methods. The ultimate purpose of using machine learning models is to avoid costly calculations such as DFT, so ironically, we should bear the high cost burden of DFT again to prepare the input value of the machine learning models.

For the last two to three years, different attempts have been made to overcome this limitation. I think that the graph convolutional neural network (GCNN) is an optimal method for now. The GCNN can encode the complicated structures of catalysts or adsorbates very flexibly by using a graph-based encoding method, and the input information used does not include expensive quantum mechanics–based computational results. It has overcome the fatal limitation of the previous models.


Q: Do you mean that the GCNN model you mentioned is the SGCNN program installed in MatSQ?

Yes, that’s right. In 2018, Prof. Jeffrey C. Grossman of the Massachusetts Institute of Technology (MIT) reported a program called crystal graph convolutional neural networks (CGCNN) in Physical Review Letters™ (PRL). While the existing CGCNN is a validated model for predicting bulk properties, our research team modified the model into one only for catalyst research, named it “SGCNN,” and reported it to Chemistry of Materials.

Basically, the SGCNN model has the advantage of using information obtainable directly from the periodic table as input. Early AI-based adsorption energy prediction models were inefficient because they took a very long time to prepare quantitative values of DOS (input). In contrast, our model does not require much time in preparing input because it uses only information from the periodic table. So, the practicality of the model has greatly been improved.


Q: What does the SGCNN model mainly predict regarding catalysts?

The SGCNN model is a machine learning algorithm program for predicting the adsorption energy (binding energy) between a catalyst and an adsorbate. Adsorption energy is one of the properties that most directly affects the catalyst. Because it is used as the most important indicator when we design certain catalytic materials, we can directly or indirectly determine information on catalysts, such as their activity or selectivity, by understanding the adsorption energies.

Although the quantitative values of adsorption energy are commonly obtained through quantum mechanics–based calculations such as DFT, machine learning models, including SGCNN, are expected to predict the values much faster. Our model mainly applies to the nitrogen and oxygen reduction reactions among many catalytic reactions.


Q: Is there any reason why nitrogen and oxygen reduction reactions are particularly attracting attention among catalyst-related studies?

First of all, the nitrogen reduction reaction is a chemical reaction for the production of ammonia (NH3). This is broadly used to make fertilizers in agriculture and transport hydrogen, which is highly recognized as an eco-friendly resource these days. Ammonia is a key material for realizing the hydrogen economy.

Second, the oxygen reduction reaction is an essential one that takes place in a hydrogen fuel cell (generating electricity using hydrogen as a fuel). The hydrogen fuel cell is the most eco-friendly future energy technology because water (H2O) is the final product to be emitted. Currently, studies are carried out to develop new materials that can outperform platinum (Pt), the most widely used catalytic material for oxygen reduction reactions. In my opinion, our SGCNN will make it possible to select a group of catalyst candidates that can promote the oxygen reduction reaction more efficiently.


Q: What are you going to focus on to supplement and improve the SGCNN?

What is most important is to secure high-quality data in large quantities. The SGCNN is a model that is currently trained based on data on the structures and 20,000 adsorption energies on various catalytic surfaces, but it is still far from the satisfactory level. Because the field of catalytic materials encompasses a wide range of research, we need more data on adsorbates in the future.

For example, our SGCNN data is largely related to nitrogen and oxygen reduction reactions, as I mentioned earlier. Imagine a research team dedicated to formic acid decomposition reaction; they cannot trust the SGCNN model because it does not include the adsorbate information they consider important.

We will continue to improve and complement the SGCNN until more catalyst researchers can trust its predictions and we can also guarantee the quality. We need to hold a sufficient amount of data in a consistent manner. I cannot exactly define the sufficiency, but if we have more than one million pieces of data to train the SGCNN model, its reliability will be highly enhanced.

Still, many trust the predictive values of DFT simulation but are skeptical of those of machine learning models. However, I believe that they will doubtlessly use the predictive values of machine learning models someday, once the amount of data used to train the model is overwhelming.


Interviewer : Clara Kim



enlightenedGrossman Research Group :

enlightenedRead more : Artificial Intelligence to Accelerate the Discovery of N2 Electroreduction Catalysts


Comments - 2021-10-26 01:53:06 34
That's excellent news. I Hope, conventional DFT methods can quickly be overcome with data-driven science for excellent material discovery. Share some tips/resources for the newbie in data science. how they can learn this kind of DFT-AI development?