Skip to main content

Machine Unlearning #3 (Clustering)

Machine Unlearning is a series broken up into tiny, one-minute readable pieces to humor our ever-shortening attention span. Sharing the links to every single piece right below:

We have already gone through classification and prediction. Now let us see what clustering is. Another popular learning technique, clustering is different from the other two since it is an unsupervised learning technique. What does that mean?

Let us revisit the classification technique. We show the machine an Orange and explains the features of the Orange to it. Similarly, each different fruit and its features are shown to the machine during the training phase. Once it has learned enough, we use the machine to label a randomly picked fruit.

In clustering, such training does not take place. We present the system with a basket full of different fruits (Apples, Oranges, Bananas, Cherries, and Mangoes) and expect the system to sort them. How would the system go about this task?

Well, some features come to play here as well. The fruits in the basket differ from each other on the basis of color, shape, length, or size. The system might pick one of these features in random. Let’s consider the color of the fruit. The system starts sorting the fruits based on their color first. In our basket, apples and cherries get sorted together since they are both red. Similarly, bananas and mangoes get grouped together since they are both yellow in color. There would be a third group consisting only of oranges. 

Then, the system would look at a different feature for the next round, say - the shape of the fruit. It looks at the red group of apples and cherries and checks if all fruits in the group are of the same shape. Clearly, they aren’t. Thus, the sphere-shaped cherries get sorted together while the others (apples) get sorted as a second group. The red fruits group is now split into apples and cherries. Similarly, the yellow group would also be split into two groups - consisting of bananas and mangoes. In our simple example, after round two, we are left with five unique groups (clusters) of fruits. This is how clustering is performed. The items in one cluster would be very similar to one another, while they would have differences with items of another cluster.

As in the previous cases, let us now check how we employ clustering in our own lives.

Let us assume that you, an Indian, is in Dubai looking for a job. When the Arab interviewer asks you where you are from, you introduce yourself as an Indian. You get employed, you greet your employer with a Marhaba, and earn your Dirhams at the end of every month. When you meet another Indian in Dubai, you are elated. You greet them Namaste, get excited about the upcoming India Pakistan cricket match, and probably make plans to celebrate Dussehra together with the Indian community in Dubai. However, the moment you arrive at the Dubai Indian Dussehra Party, you cease to be Indians and become Marathis, Tamilians, Rajasthanis, Assamese, or whichever state you are from. The differences between Indians become more pronounced. The Dussehra of the Delhiite becomes Durga Puja for the Bengalis or Vijayadashami for the Kannadigas.  The people from the south of India collectively become idli devouring, Telugu speaking Madrasis for the northerners.

Things become even complicated when you board the flight to come home for a vacation. Naturally, most passengers in the flight bound to your state would be people from the same state working in Dubai. As you interact with them, more differences start appearing. You start becoming Keralites less and Thekkans (people from the south of Kerala) or Vadakkans (north of Kerala) more. The Vadakkans, who were lamenting the bias against the South Indians by the Northerners, themselves start looking down at the Thekkans - calling them self centered and selfish. The Thekkans retaliate by making fun of the north’s sing-song accent.

As more and more features (country, state, and region) come into play - we get more and more divided on those lines. Like regression, clustering is not devious inherently. It is natural that people have their differences and express affinities towards the group they fit in. Trouble starts when people forget the bigger picture and start placing their group above the others.

The rising resentment over immigrants by natives, especially of first world countries could be termed as an example of the same. According to them, the resource in a country is a natural right of the people born in the country. Anyone coming from outside to their land is often considered parasites, or freeloaders. In a world where the place of our birth is just a matter of chance, how futile is this petty mindset! People migrate from their homelands in the hopes of a better life. They go to a culture alien to them, work hard, and strive for a decent living. Harassing them by calling them freeloaders while sitting in the comforts of your privilege is disgusting at best.


Popular posts from this blog

Machine Unlearning #0 (Intro)

You might be familiar with the term Machine Learning. Worry not if you have not, cause I have tried to give a gist of the concept here. The term has been in the limelight of late and has been tossed around rather liberally to denote anything related to artificial intelligence, robotics, and data mining. Machine Learning, as the name suggests, could simply mean the field of study of enabling the “machines” (computers) to “learn” from past experiences and make informed decisions in the future.   Wait a minute! Learning from past experiences is something humans do, right? Exactly! The computer folks want computers to behave more and more like us. As if there aren't enough of us already. As the machines are becoming more like us, we are becoming more like them. Introspection time! Most of us wake up every morning like clockwork! Then we rush through the morning routines - get dressed, wade through the traffic, and reach our offices or schools or wherever people expect us to be. We spe

The High State

 Before The Judgement I believe I must begin by addressing the pressing question - Was planning a vacation in the midst of a pandemic a recommended move?  No. Yet we went ahead with it. Here is why.  We (Nithya & I) were newly married, and our vividly planned vacation at the island of Langkawi was stolen away from us by the virus. Our stay in Delhi was coming to an end due to job-related moves, and we felt it would be a waste not to utilize this opportunity in exploring at least one of the tourist hot spots easily accessible from the national capital region. Let us end this section by answering another question - Are the reasons listed above good enough to risk a vacation during a pandemic? No. We had taken a calculated risk. Arrival at Manali There are two phases to this - planning and execution. We had not started planning with Manali in mind. There were numerous choices - starting from Jaipur and Amritsar to Nainital, Shimla, and Manali. After a bit of reading and deliberations,

Machine Unlearning #1 (Classification)

You can’t conclude a discussion on Machine Learning without mentioning classification. Classification is a machine learning technique where the machine is trained to predict the label of the given input data. Alright, let’s cut the jargon and get some real-world examples. Oranges and Bananas. Let’s assume that we have a box of fruits that contain some oranges and some bananas. You are asked to pick one fruit at random and tell if it is an orange or a banana. Pretty basic, right? For us, it is straightforward. We would know the answer at first sight. But, how would a computer be able to tell the difference? In classification, the machine would first be trained on some pre-labeled data. It would be shown an orange and we would tell it that the fruit is an orange. The machine would study the orange and remember its features - orange color and round shape. Then it would be shown a banana and the process is repeated. What are these features? A feature is anything that helps us uniquely labe