We have already gone through classification and prediction. Now let us see what clustering is. Another popular learning technique, clustering is different from the other two since it is an unsupervised learning technique. What does that mean?
Let us revisit the classification technique. We show the machine an Orange and explains the features of the Orange to it. Similarly, each different fruit and its features are shown to the machine during the training phase. Once it has learned enough, we use the machine to label a randomly picked fruit.
In clustering, such training does not take place. We present the system with a basket full of different fruits (Apples, Oranges, Bananas, Cherries, and Mangoes) and expect the system to sort them. How would the system go about this task?
Well, some features come to play here as well. The fruits in the basket differ from each other on the basis of color, shape, length, or size. The system might pick one of these features in random. Let’s consider the color of the fruit. The system starts sorting the fruits based on their color first. In our basket, apples and cherries get sorted together since they are both red. Similarly, bananas and mangoes get grouped together since they are both yellow in color. There would be a third group consisting only of oranges.
Then, the system would look at a different feature for the next round, say - the shape of the fruit. It looks at the red group of apples and cherries and checks if all fruits in the group are of the same shape. Clearly, they aren’t. Thus, the sphere-shaped cherries get sorted together while the others (apples) get sorted as a second group. The red fruits group is now split into apples and cherries. Similarly, the yellow group would also be split into two groups - consisting of bananas and mangoes. In our simple example, after round two, we are left with five unique groups (clusters) of fruits. This is how clustering is performed. The items in one cluster would be very similar to one another, while they would have differences with items of another cluster.
As in the previous cases, let us now check how we employ clustering in our own lives.
Let us assume that you, an Indian, is in Dubai looking for a job. When the Arab interviewer asks you where you are from, you introduce yourself as an Indian. You get employed, you greet your employer with a Marhaba, and earn your Dirhams at the end of every month. When you meet another Indian in Dubai, you are elated. You greet them Namaste, get excited about the upcoming India Pakistan cricket match, and probably make plans to celebrate Dussehra together with the Indian community in Dubai. However, the moment you arrive at the Dubai Indian Dussehra Party, you cease to be Indians and become Marathis, Tamilians, Rajasthanis, Assamese, or whichever state you are from. The differences between Indians become more pronounced. The Dussehra of the Delhiite becomes Durga Puja for the Bengalis or Vijayadashami for the Kannadigas. The people from the south of India collectively become idli devouring, Telugu speaking Madrasis for the northerners.
Things become even complicated when you board the flight to come home for a vacation. Naturally, most passengers in the flight bound to your state would be people from the same state working in Dubai. As you interact with them, more differences start appearing. You start becoming Keralites less and Thekkans (people from the south of Kerala) or Vadakkans (north of Kerala) more. The Vadakkans, who were lamenting the bias against the South Indians by the Northerners, themselves start looking down at the Thekkans - calling them self centered and selfish. The Thekkans retaliate by making fun of the north’s sing-song accent.
As more and more features (country, state, and region) come into play - we get more and more divided on those lines. Like regression, clustering is not devious inherently. It is natural that people have their differences and express affinities towards the group they fit in. Trouble starts when people forget the bigger picture and start placing their group above the others.
The rising resentment over immigrants by natives, especially of first world countries could be termed as an example of the same. According to them, the resource in a country is a natural right of the people born in the country. Anyone coming from outside to their land is often considered parasites, or freeloaders. In a world where the place of our birth is just a matter of chance, how futile is this petty mindset! People migrate from their homelands in the hopes of a better life. They go to a culture alien to them, work hard, and strive for a decent living. Harassing them by calling them freeloaders while sitting in the comforts of your privilege is disgusting at best.