MatSQ Blog > Column

Artificial intelligence learns experimental data to search for better experimental conditions

Viewed : 102 times,  2021-11-08 09:01:33

When experimental data are collected through platforms, etc., information hidden in the data can be found and utilized by AI.


As big data and artificial intelligence (AI) are attracting more attention, data are being collected in a wide range of fields. Chemical reactions are no exception. For example, the Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), has been working with Virtual Lab on the collection of experimental data on catalyst synthesis and catalytic reactions, which will then be turned into a data platform. When experimental data are collected through platforms, etc., information hidden in the data can be found and utilized by AI.

One of the benefits of AI is a process in which researchers can seek the next experimental conditions from experimental data to obtain better experimental results. In this article, I would like to introduce a case of using AI to optimize reaction conditions from experimental data on the conversion of methane into C2 compounds (ethane, ethylene, and acetylene).


Application of AI in the experiment of methane conversion


Figure 1: A process of optimizing experimental conditions from experimental dataFigure 1: A process of optimizing experimental conditions from experimental data


1) Collection of experimental data

The process of optimizing experimental conditions from experimental data is divided into three steps, as shown in Figure 1 above. The first step is to collect experimental data. Although this article focuses on experimental data, of course, these data can be generated in diverse ways, such as computer simulation. Then, data preprocessing is necessary as a process of identifying errors in the data and determining variables to describe the data.

In the conversion of methane, some data with missing values in experimental conditions such as pressure were removed from the raw data which are collected from a laboratory. Six variables were chosen to describe the experimental results: temperature, pressure, hydrogen content (hydrogen is essential to control methane conversion.), flow velocity, and the length and diameter of a reactor. AI-based methods for finding important experimental variables have been studied, but the variables used in this case were selected by the researchers.

Next, a total of 250 data were collected. The number of data items necessary for AI usage was the most asked question, which inevitably differs depending on the problems. Just as two points are needed to draw a linear trend line, the complexity of the data patterns will determine the number of necessary data items. As mentioned later, the experimental data regarding this methane conversion was described with less than 5% errors in conjunction with 6 variables and 250 data items.


2) Generation of machine learning model of experimental data

The second step is to create a model that explains experimental data. The model is intended to match the experimental conditions and results, describing phenomena that actually occur in a laboratory. Although expressed in easy-to-understand formulas, experimental data are usually difficult to explain through simple relational expressions of experimental conditions.

This is why machine learning is utilized. A machine learning model is designed to use experimental conditions as input variables and experimental results as output ones. Two machine learning methods were employed in the conversion of methane based on artificial neural networks and decision trees. The choice of the two methods is attributed to both their wide use in machine learning and reasonable performance.

Another advantage is the high availability of Python programming examples. The examples can be used to create machine learning models, verifying the performance. Predicting the performance of a machine learning model again through experiments is not helpful but just time consuming. Therefore, 80% of the data was used to train the machine learning models, and the rest was used for test.

As a result, the errors were confirmed, as shown in Figure 2 below. The errors in machine learning were calculated by taking the absolute value of the difference between a target value and a predicted value for each verified data item and then averaging all those absolute values. The errors observed were less than 5%.

Figure 2: Machine learning methods and errors

Figure 2: Machine learning methods and errors


3) Exploration of optimal experimental conditions by applying machine learning

In the third step, optimal experimental conditions are found through a machine learning model. Given that a machine learning model can match experimental conditions with experimental results, as mentioned earlier, it can be inferred that the corresponding results can be obtained by inputting the imaginary experimental conditions in the machine learning models. To find better experimental conditions, we need to create multiple imaginary conditions and check the results.

Calculation time will vary depending on the machine learning models, but in this case, the experimental result for one experimental condition was predicted in seconds by machine learning. Therefore, the process was simple enough to be done even on a personal computer.

The optimization method used in this case was an artificial bee colony algorithm. This method proceeds through an iterative process, as shown on the right side of Figure 1. The starting point is to create imaginary experimental conditions, which are fed into a machine learning model to acquire the imaginary experimental conditions. The researchers just need to score the experimental results to determine the imaginary experimental conditions with higher scores.

In this case, the scoring method applied is that the more methane is converted into C2 compounds, the higher the score, and the more side reactions methane causes, the lower the score. The reason for the use of this scoring method is that when experimental conditions are optimized in a way that only takes into account the pathways of converting methane into the desired products, the results are obtained just by exploring the experimental conditions in which side reactions occur frequently.

With the scoring method, the experimental conditions are ranked, and then the next experimental ones seek to predict the results through the machine learning model. The optimal experimental conditions are derived from the iterative process of searching for them with higher and higher rankings, just as an evolutionary algorithm. In addition, it was confirmed that the experimental results were obtained within the range of machine learning errors when the experiment was carried out under the experimental conditions found by AI.


Figure 3 an artificial bee colony algorithm.

Figure 3 an artificial bee colony algorithm.



In this article, methods for optimizing the conversion of methane with experimental data were introduced. They can be utilized in different ways by researchers. A similar method can apply if a problem can be summarized in the forms of experimental conditions and results. AI is expected to address various problems where hidden relationships are difficult to find in experimental data.



Kim, H.W.; Lee, S.W.; Na, G.S.; Han, S.J.; Kim, S.K.; Shin, J.H.; Chang, H.; Kim, Y.T. React. Chem. Eng. 2021, 6, 235.





 Hyun Woo Kim | Korea Research Institute of Chemical Technology