Deep learning networks may prefer the human voice — as we do

The digital revolution is built on a foundation of binaries, invisible 1s and 0s called bits. The notion that computers prefer to “speak” in binary numbers is rarely questioned. According to new research from Columbia Engineering, that could be about to change.

A new U.S. National Science Foundation-funded study by mechanical engineer Hod Lipson and researcher Boyuan Chen proves that artificial intelligence systems might reach higher levels of performance if they are programmed with sound files of human language rather than with numerical data labels.

The researchers discovered that a neural network whose “training labels” consisted of sound files reached higher levels of performance in identifying objects in images than another network that had been programmed in a more traditional manner that used simple binary inputs.

“To understand why this finding is significant,” said Lipson, “it’s useful to understand how neural networks are usually programmed, and why using the sound of the human voice is a radical experiment.”

Read more