While the automated monitoring system currently in place for detecting and processing earthquakes in Yellowstone works quite well most of the time, its solutions need to be reviewed and refined by a seismic analyst. This means that the larger earthquakes—generally over M1—get most of the attention, and smaller earthquakes, which are harder to locate, are not always processed. The current system can also struggle in situations like earthquake swarms, where there is a lot of seismicity close together in space and time. Ideally, there would be an automated system that can detect earthquakes accurately, including those that are small and occur close together in time, and process them as expertly as a seismic analyst would. Then, only the most important events would need to be manually reviewed. But the information known to a seismic analyst is hard to write down as a concrete set of rules for a computer to follow and that would work well in a large variety of situations. All hope is not lost, however. A special set of tools known as machine learning can be used to help with this problem.
Machine learning refers to computer algorithms that try to learn the statistics of a dataset of interest to answer a question about similar data that have not yet been seen. In many cases, the data that an algorithm is given are a set of features that a human thinks are important for describing the dataset. As an example, to describe a person we might rely on data that include features like height, weight, age, and hair color. The machine learning algorithm then uses these features to try to solve a “regression” problem (producing a real-valued answer) or a “classification” problem (deciding the category that an example fits into).
Feature selection is an important step, because the algorithm can only perform as well as the data it is given. Unlike humans, most of these algorithms do not have access to more data, new experiences, or past experiences to try and learn from; they can only learn from the data and feedback we present to them. As a result, it is important that the algorithms have large datasets to draw upon. But this is also a strength! Machine learning can be a very powerful tool because it allows the statistics of a large amount of data and features to be considered—more than any human would be able to look at and make sense of on their own.