This is going to be a unique post as I get to post it on both my professional and musical blog.
This summer I decided to experiment with Artificial Intelligence and apply it to musical improvisation. Artificial Intelligence is the buzzword everyone seems to use these days – some with awe, some with curiosity, some with fear of the unknown and some others even with gross ignorance. My goal was to take an interesting problem where AI could be applied, solve the problem (or part of it) and learn about how it works and its limitations.
The Problem
I wanted to select a real world problem statement – something I can relate to. As it turns out, I am an Indian Classical Flutist and have been performing the Indian flute – Bansuri – for good part of last four decades. Indian Classical Music is improvisational music – i.e. the music is created by the artist as he/she performs it. Just like Jazz, it has some basic framework and as long as one sticks to that framework, one can improvise at will. This makes this music quite spontaneous and free flowing.
Accompanying the lead musician is equally challenging, interesting and creative. The accompanists too follow certain framework in which he/she improvises accompaniment (say a drum, or even other accompanying instruments such as Harmonium, Sarangi, Violin etc.). A good accompanist also takes melodic as well as rhythmic influences from the lead musician and enriches overall performance by appropriately responding to these influences.
It all works great when the musicians get together and jam. However, if you are not with a good accompanist, your options suddenly get narrowed. These days, one does have various options but most of them are pre-recorded or repeating. See below a jazz street musician performing outside Pier 39 in San Francisco. Notice that he starts a pre-recorded drum track.
https://www.youtube.com/watch?v=tKBRNcH8SBA
In North Indian classical music, the main drum accompanying a concert is called Tabla. More on the details of the tabla later in the post, but if one does not have a human accompanist, these days one can use the iOS or Android apps. However, these apps too are no substitute for human accompanists as they neither improvise nor respond to the lead musician.
So my problem statement was essentially to see if I could create an artificial Tabla drum accompanist for myself.
So What is AI
As mentioned, a lot of mystic surrounds the term AI. In reality, it is quite simple. Let’s say you have a 3-4 year old kid who has no background in playing a musical instrument, wants to learn it. He typically would go to a teacher who would start teaching him the basics of the instrument. As the kid practices more, he develops basic competency in the instrument. As the kid continues to learn, he would start developing further depth in musical expression. After a while, his learning would get more and more deep and sophisticated. He would not only learn by following his teacher’s lessons, but also copying ideas from other musicians who he likes. Over time simply copying of ideas changes to taking influences from multiple ideas and developing the kid’s own style (more likely by this stage, he is not a kid anymore). This learning process continues over his lifetime.
You can see an example of this process in a short time span demonstrated by Ben Zander here – https://www.youtube.com/watch?v=8bJNw91QyyM
What exactly is happening in the kid’s brain as his learning is getting sophisticated. All of us are born with brain structure and network. An element of this structure is called a Neuron (brain nerve cell). A human baby is born with about 100 billion of them (roughly same number as the number of stars in Milky Way galaxy). These do not add in numbers much through human life. What happens as humans learn is that these neurons start making appropriate connections. More learning leads to more connections in the brain and more sophisticated thinking and responses in the areas of that learning.
Artificial Neural Networks (ANNs) are the computer representation of pretty much the same concept. Just like neuron in the brain, one can create computer architecture of neurons that operate in bits and bytes. The idea is to then create a network of these artificial neurons. Each link in the network – the connection – can be represented in form of weights between zero and one – to reflect how strong that connection is.
Neurons that represented input to the network are the input neurons much the same way that our sensory organs might be connected to some set of neurons within the brain. Similarly, the nervous system carrying instructions from the brain to sensory organs is connected to output neurons. Internal connections are represented by the so called hidden layer of neurons. Depending on the architecture and application of the neural network, there can be more than one hidden layers.
So to continue with our example, let’s say we initialize these connections with random weights between zero and one, representing the unlearnt state of the kid’s brain. Then let’s say that we start feeding the training data – which is basically a combination of inputs and corresponding outputs. Now let’s say we had a mathematical way to optimize these weights so that the input is predictably translated to output. With each new learning dataset, these weights would adjust ever so slightly in such a way as to make all the learning fed till that point still relevant. Obviously, all datasets cannot be perfectly true all the time, so the goal of such process would be to go as close to finding middle of the road as possible.
This mathematical way to optimize weights based on learning data is called Backpropagation Algorithm. If the dataset is large enough, the error between the output predicted by neural network based on set of inputs and output actually noted in training data would become smaller and smaller.
When such error is small enough, the network is deemed trained. At this stage, one can just feed inputs to this network and expect output that is in line with outputs in training dataset. In other words, if we feed large enough tabla samples from real concerts of a great artist like say Zakir Hussain, the neural network will learn to produce tabla sounds like Zakir Hussain. (For those who do not know, Zakir Hussain is probably one of the finest tabla players ever to live).
In fact, depending on which artist produced in training data, the Artificial Intelligence in the neural network would produce output like that artist. You can even mix the training to contain two or more artists’ concert data to potentially get fascinating results.