The development of new materials requires the consideration of various composition and process variables. However, most precedent materials and process development methods have been implemented through a highly inefficient trial-and-error based process. This, therefore, led to a lack of information and time constraints, hindering the effective and efficient materials development.
A way to resolve these drawbacks is developing materials using materials informatics. In reality, the introduction of informatics to materials research and improved availability of high-performance computing and materials data have significantly reduced the costs and length of materials development. The support of tools and systems capable of generating, processing, and applying vast amounts of data considerably changes intuitive and intelligent materials design.
AI-based discovery of new materials
A research strategy for discovering artificial intelligence (AI)–based materials should start with the right problem setting. In other words, the size of a problem (i.e., the number of dimensions of each value of X and Y) and model sizes and the required data quantity should be clearly defined, along with the clarification of the features (X-value) and outputs (Y-value).
The problem size simply refers to the number of features (X-value) and the number of outputs (Y-value). Suppose that yield strength and maximum tensile strengths are calculated with 13 composition variables (e.g., C, Si, Mn, P, S, Cu, Sn, Ni, Cr, Mo, V, Nb, and Ti) and 2 process variables (e.g., temperature and heat treatment time). The problem size includes a 15-dimensional X-value and a 2-dimensional Y-value.
In contrast, the model size simply means the number of parameters in the model. The model size used to solve a problem should be determined by considering the problem size and the quantity of data. If a problem consists of the values of X and Y set in 15 dimensions and 2 dimensions, respectively, along with only about 100 pieces of data, it will be appropriate to use models, such as the k-nearest neighbors (KNN) algorithm or the random forest (RF), that have much fewer parameters than an artificial neural network (ANN) that usually has hundreds of thousands of parameters.
Limitations of AI-based materials discovery
A prerequisite of materials discovery through AI-based models is whether or not there is big data. Unfortunately, there is almost no valuable AI experimental data available in any field of materials science. Nevertheless, the majority of researchers mistakenly believe that they can easily obtain big data from papers or other references.
Because of the nature of materials science, preparing even a single sample in an attempt to create data is quite time consuming. Therefore, it is virtually impossible to secure or obtain big data from actual experimental data. For this reason, most data-based materials researchers often discover new materials through supervised learning–based AI by utilizing virtual computational data. However, this also cannot escape the gap between virtual data and the actual experimental data, either.
Misunderstanding backward models
Currently, AI technologies used in materials research are primarily composed of forward predictive models designed to predict the properties of materials through process and composition variables. However, our desired AI-based materials discovery mainly deals with problems such as the kinds of composition or processing methods needed to get the desired material properties.
For example, rather than a forward model intended to predict the resulting tensile strength (MPa) when “A” alloy composition is heat-treated at a certain temperature on the basis of the alloy data, a backward model is more preferable, which predicts requirements for temperature and alloy composition to obtain an alloy with a tensile strength of 500 MPa or higher.
Some researchers who are unfamiliar with the backward predictive model simply think of the backward predictive model as the forward prediction model in reverse. In fact, there is a case where a reversed forward predictive model was published as a backward predictive one in npj Computational Materials, a famous overseas journal.
However, such a backward model can only be used when the number of the features (X-value) is the same as or similar to the outputs (Y-value). In general, because the number of the features (X-value) is much larger than that of the outputs (Y-value), the simply reversed predictive model does not work properly.
Figure 1. The dilemma of the existing backward model
A backward predictive model of a new paradigm: direct optimization
(e.g., metaheuristic optimization, Bayesian optimization, etc.)
In recent years, metaheuristic-applied backward predictive models of a new paradigm have been used. Among them is direct optimization, intended to optimize materials through a metaheuristic algorithm. This method can overcome the materials discovery stereotype through the supervised learning–based AI and resolve problems when securing big data.
Direct optimization is designed to seek an optimal solution with a few experiments because it is practically impossible to obtain a perfect solution when considering various composition and process variables. In other words, the use of a metaheuristic algorithm makes it possible to gradually create experimental data while reaching optimization.
Figure 2. A heuristic algorithm
This direct optimization can be viewed as the most innovative method in the materials science fields, where there is almost no actual experimental database. Metaheuristic-based direct optimization algorithms include NSGA-II and III, particle swarm optimization (PSO), and more.
Metaheuristic-applied direct optimization also requires about 500 pieces of experimental data (depending on the size of a problem) to be created before the optimal solution is found. However, this is also significantly time consuming in a certain field of materials science.
On the other hand, Bayesian optimization attracts attention because it makes it possible to optimize materials composition and process variables even with a smaller amount of data than metaheuristic-based direct optimization.
High-quality data is essential for successful materials informatics. As mentioned above, however, it is virtually impossible to secure or obtain big data because of the nature of materials science. Therefore, it is preferred to efficiently develop new materials by utilizing research methodologies that can produce data step by step through metaheuristic-based direct optimization while realizing optimization instead of using forward predictive models designed to predict the properties of materials through supervised learning after further data acquisition from experiments and references.
Jin-Woong Lee, Ph.D.