These days, artificial intelligence (AI) has been adopted by the materials field, among various others. The materials field has diverse uses for AI, such as predicting material properties, developing new materials, and detecting errors in processes. However, it is not easy to produce expected outcomes just by applying AI technology. The efficient use of AI by material specialists requires a deep understanding of AI-related knowledge. Therefore, I want to introduce AI technologies and cautions when applied to the materials field.
Key Factors of AI
Machine learning is a word frequently associated with AI, and the relation between the two words is explained through their definition. (Although there are many definitions of AI and machine learning, I quote the definitions in Wikipedia .) According to Wikipedia, “Artificial Intelligence is a computer program artificially implementing human learning, reasoning ability, perception ability, argumentation ability, and natural language comprehension, or a computer system including the program.” The “computer program” or “computer system” indicates a platform implemented as a model trained and created through machine learning. In addition, “machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a part of artificial intelligence.”The “improve through experience” indicates the training of an algorithm through data. From such definitions, I would like to divide the roles of AI and machine learning as follows: AI is to implement a platform that makes decisions such as determination and classification by using machine learning technology, and machine learning is to create a model that trains complex nonlinear relationships or complex patterns through calculations from data. Thus, I think the key factors of AI are machine learning algorithm and data.
Figure 1: Key Factors of AI
Machine learning is a computer algorithm that has rapidly expanded with the growth of big data and computing resources, affecting most social technology aspects. It creates a model by training algorithmic factors from data. Thus, the efficient use of machine learning requires the analysis of comparing and interpreting various algorithms according to data, rather than focusing on a specific algorithm.
Diverse Machine Learning Algorithms
Machine learning algorithms are diverse, and are largely classified into supervised learning and unsupervised learning according to the existence of labels in the data. (Reinforcement learning with a result produced over time is excluded in this category.) Label (Class) means the result value (dependent variable) according to the input value (independent variable). A model created through labeled supervised learning calculates a predicted value with the result, so it is called a predictive model. For example, there are algorithms such as decision tree, support vector machine (SVM), logistic regression, k-nearest neighbors (kNN), Naive Bayesian, and neural networks. As unsupervised learning has no label, it analyzes unique characteristics from data and groups together the data with the same or similar characteristics. Furthermore, unsupervised learning is also called an explanation model because it summarizes the clusters created as current or past data and analyzes them based on their characteristics. For example, there are algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on the density of the entities for cluster creation, K-means based on the distance between entities, and Hierarchical Agglomerative Clustering (HAC). The neural network introduced as supervised learning is an algorithm recently widespread in unsupervised learning through network application. To select an optimal machine learning algorithm for AI, it needs to apply various algorithms according to the data.
Figure 2: Machine Learning Algorithms [1, 2]
Characteristics of Neural Network Algorithm
Neural network is an algorithm widely used and developed in diverse fields. It is based on neurology and physics, and the training process of a neural network model is similar to biological learning through the human brain’s neurons. In addition, the algorithm has the advantage of overcoming the limitations of traditional linear classifiers. Traditional algorithms, such as the linear classifiers decision tree and support vector machine, create a linear boundary to classify data so that they cannot create a nonlinear boundary. In other words, they cannot overcome the problem of exclusive OR (XOR). This problem can be handled by representing data through a kernel, an ensemble applying various algorithms in a complex manner, a neural network-like algorithm composed of complex networks, etc.
Neural network is a representative nonlinear classifier, and, in a network structure, one node can be seen as a classifier that divides labels. Neural networks with multi hidden layers and hidden nodes are affected by multiple classifiers. Moreover, they can overcome the limitations of linear models by representing data through an activation function. However, they are difficult to use in intuitively interpreting performance results, hence being collectively called the “black box model.” Therefore, neural networks are gaining popularity in fields that prioritize the importance of results rather than their understanding.
Deep Learning of Neural Network Algorithm
Recently, neural networks have expanded their basic network structure and grown into deep learning algorithms, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CNN is a method of moving and applying filters of various sizes to data. This convolution process represents an initial value through various filters, and its pooling extracts characteristics from data. This algorithm calculates a linear predicted value by fully connecting all the features created by the repetition of the convolution process. CNN is affected by order of neighboring input variables. Thus, it shows good performance in image and text data. And, RNN has a recurrent structure where hidden nodes have directions, and their lengths are variably connected. This algorithm reflects the importance of sequential information, as input variables are continuously trained during the training process. Therefore, it shows good performance in time series data.
Figure 3: Convolutional Neural Networks (CNN) , Recurrent Neural Networks (RNN) 
Application Case of Neural Network Algorithm to Material Research
Research applying neural networks is actively conducted in the materials field, and excellent performance cases are shown below. Let us examine some research on neural networks. First is a case in which a CNN is applied to a study on the design of crystalline materials. This study predicts the material properties, such as bandgap and formation energy, by applying data structures and values to crystal structures and atomic coordination information and comparing with the result of the density functional theory (DFT) to check the performance . This research model determined that the filter’s convolution process trains the relations between successive atomic positions and adjacent atoms and shows good performance. In the second case, crystal structure is predicted through diffraction pattern images. This study transforms diffraction pattern images in the form of a point containing a material’s structural information into a linear pattern to improve a model’s training ability. In addition, data characteristics can be trained through a filter, but a complex network is applied to train the characteristics that may be lost. Moreover, the data before the filter application are used to enhance the space groups’ predictive performance . This research found that connecting the points and transforming point pattern images into shape pattern images are done for performance improvement, thereby preventing prediction errors because of similar characteristics in training the data characteristics. Furthermore, the approach method of using information that may be lost contributes to performance improvement.
Use of Machine Learning for Material Research
Research on machine learning in the materials field tends to focus on the research using neural network algorithms. Prediction accuracy is important in studies that predict properties such as material properties and structures. However, understanding predictive performance results may prove difficult. A requirement in the materials field is to understand the relations between input values and predicted ones as well, as developing excellent predictive models is important. Thus, the materials field needs to understand algorithms and data with high prediction accuracy by applying diverse machine learning algorithms. For a good understanding of result data, it will be useful to have various approaches such as comparing similarities and differences between the data with good predictability and the data with less predictability. Moreover, in using linear algorithms, it will be possible to understand the result intuitively if a predictive model performance is good. Finally, a study using an explanatory model through data clusters, as described above, is expected to have meaningful results.
 Wikipedia, https://en.wikipedia.org/wiki/Main_Page
 Navigli, Roberto. "Word sense disambiguation: A survey." ACM computing surveys (CSUR) 41.2 (2009): 1-69.
 Jeongrae Kim, “A fraud detection technique of business transaction based on hierarchical clusters-deep neural networks”, doctoral dissertation, UOS, (2019).
 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
 Xie, Tian, and Jeffrey C. Grossman. "Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties." Physical review letters 120.14 (2018): 145301.
 Tiong, Leslie Ching Ow, et al. "Identification of Crystal Symmetry from Noisy Diffraction Patterns by A Shape Analysis and Deep Learning." arXiv preprint arXiv:2005.12476 (2020).
Jeong Rae Kim
Korea Institute of Science and Technology, Computational Science Research Center, Post-Doctor