Plato, Aristotle and Machine Learning
The School of Athens is one of the most well-known frescoes in the world. It is a Renaissance masterpiece depicting the great philosophers of the classical era, painted by Raphael between 1509–1511. While there is much to discuss in this fresco, I would like to draw your attention to the two central figures: Plato pointing to the skies and his student, Aristotle, pointing towards the earth.
Why did Raphael depict them like this?
Plato believes that the world we are living in, the material world, is only a shadow of the ideal world. For example, an apple we see in this world is an imperfect reflection of the perfect apple that resides in the ideal world. Plato calls these perfect or ideal things their Forms, hence his theory of Forms. Theory of Forms is not limited to objects, we can also talk about the Form of education, the Form of friendship and so on. Aristotle, however, goes against his teacher and claims that the material world is real. He believes that the Form resides inside the thing in question, in the material world. If there is no apple, there is no applehood. They both believe in Forms but disagree on whether they exist in another world (Plato) or in this world (Aristotle).
Now you might guess the message Raphael wanted to convey, Plato points the skies because of his belief that the Forms are in another world and in contrast Aristotle points the earth saying Forms can actually be found in the physical world.
This discussion falls into the realm of the problem of universals in metaphysics. Universals are the things that two or more entities have in common (e.g. being a cat or the ideal cat according to Plato) and a universal has instances called particulars. Garfield is a particular cat. He has properties common to all cats, like trying to fit into a box, and other properties that not every cat has such as being lazy, cynical and orange. Philosophers discuss whether universals really exist, and if so, where they reside.
But what has all this got to do with machine learning? Before answering this question, I will try to explain what machine learning is in terms of data, signal, and noise. Let us first clarify these terms:
Data: Values you observe or measure.
Signal: Expected value of the observation or measurement.
Noise: Imperfections causing expected value and observed value to differ.
Based on these definitions we can say that Data = Signal + Noise. Let me try to explain this concept with a concrete example.
How would you plot Force (N) vs. Acceleration (m/s²) behavior of an object with m = 2 kg? Ideally it should follow Newton’s second law, F = m*a. However, in the world we are living in, we know that things are not perfect. Hence the observed behavior would be something like F = m*a + noise. Below you can see the plots and the code used to generate them:
m=2 #mass of the object
a=10*np.random.rand(50,1) #50 random acceleration values
F_ideal=m*a #Ideal F
F_observed=m*a+np.random.randn(50,1) #Observed F
Essentially, machine learning algorithms try to learn the signal inside the data. It is important to emphasize that these algorithms are given the data but they don’t know which part of the data is signal and which part of it is noise.
For example, let’s apply linear regression to the data we generated above. You can see that the fit is almost equal to the signal. The machine learning algorithm had no idea about the signal I used to generate this data, yet it was able to find a very close approximation to it.
from sklearn import linear_model
model = linear_model.LinearRegression()
Now we are at a point where we can see the correspondence between machine learning and the problem of universals.
Machine Learning: Data = Signal + Noise
Problem of Universals: What we see = Universals + Particular properties
Imagine that your friend asks you to build a Bengal cat classifier. You collect Bengal cat images (the data) and train a convolutional neural network (CNN) for this task. The algorithm looks at the data, separates signal (universal Bengal cat) from the noise (particular things e.g. one cat has a scar, in another image there is a tree at the background etc.) and hence learns what an ideal Bengal cat should look like. The algorithm stores what it learned as parameters called “weights”. In other words, the weights of the CNN after training corresponds to the universal Bengal cat — the signal.
The key takeaway from this post is that machine learning algorithms are aiming to learn the universals (also called Forms or signal) inside the data. Hopefully, this will help you view some concepts in machine learning in a different light and grasp them more readily.