Table of Contents


Last Updated: 2/6/2023

Classification

classification

What is Classification?

Machine learning classification is like the game "guess who". In this game, we have a group of people, and we have to figure out who someone is by asking them questions. For example, we might ask if they are a boy or a girl or if they have brown or blonde hair. Like in the game, in machine learning classification, we have a bunch of data that we use to figure out what category something belongs to. We might have some pictures of animals and want to know if they are cats or dogs. So we give the machine learning program a bunch of examples of cats and dogs, and it will use those examples to figure out how to tell them apart. When the machine learning program is trying to classify something, it's like it's trying to guess who the person is in the game. It looks at all the different pieces of information it has and uses that to make the best guess it can. If it makes the right guess, it gets a good score! If it doesn't, it gets to try again with more examples and hopefully do better next time. So machine learning classification is a way for computers to learn how to classify things into different categories based on the data we give them. It's like a game, but instead of trying to guess who someone is, we're trying to guess what something is.

"Predicting a qualitative response for an observation can be referred to as classifying that observation, since it involves assigning the observation to a category, or class." — An Introduction to Statistical Learning, Gareth James, Daniela Witten, Trevor Hastie & Robert Tibshirani.

Classification problems consist of predicting to which class a particular individual belongs. To do this, we must collect different features of each individual whose class or class we know. With this information, the algorithms can profile each class and separate them from others. Once trained, the classification algorithms can identify to which class any individual or example belongs. Almost anything can be classified with machine learning. Machine learning can classify:

  • Text documents, such as emails, news articles, or social media posts, into different categories based on their content.
  • Images based on the objects or people in them.
  • Videos based on their content or actions.
  • Audio recordings based on the language spoken, the speakers, or the content of the conversation.
  • Data based on its characteristics, such as identifying fraudulent transactions in a dataset of financial transactions.
  • Data from sensors, such as identifying different types of movement or activity based on sensor readings. And so on.

"Classification is one of the main instances of supervised learning. Given a training set of data containing observations and their associated categorical outputs, the goal of classification is to learn a general rule that correctly maps the observations (also called features or predictive variables) to the target categories (also called labels or classes)." — Python Machine Learning By Example, Yuxi Hayden Liu

One non-programming real-life example of classification is sorting laundry. When you do your laundry, you might have a basket of dirty clothes that need to be sorted. You might sort them by color (whites, darks, and lights) or by fabric type (delicates, towels, and regular clothes). This process of sorting the clothes into different categories is an example of classification.

Examples

  1. Spam detection: Machine learning classification can identify spam emails and filter them out of a person's inbox. If we were to try to write a spam mail classifier, line by line, it would take us hours, or even months, to detect all the words or patterns within mail classified as spam. When the program is released to production, spammers will already have improved or modified its emails. Artificial intelligence algorithms find all those patterns in emails within a few seconds or minutes. These days, most email services have AI-trained spam classifiers.

cleanup-amico

  1. Fraud detection: Machine learning classification can identify fraudulent transactions and prevent them from being processed. These types of problems could also be solved with anomaly detection. Another kind of problem that we will learn about later.

  2. Medical diagnosis: Machine learning classification can help doctors diagnose diseases based on symptoms and other patient data. We could train an algorithm to analyze X-rays, CT scans, or MRIs and classify them into healthy or sick patients, thus detecting diseases early. With a patient's medical history, we can predict if they have or could have a severe illness such as diabetes or heart disease.

radiography-amico

  1. Image recognition: Machine learning classification can be used to identify objects in images or videos, such as identifying different types of animals in nature footage. Animals could also be classified based on their features: such as their skin, the number of legs, colors, size, position, or shape of their eyes, and if they have wings, scales, or hair, among many others. With time we will be able to distinguish and name each animal in the photographs, and we should be ready to distinguish them in real life, either in a zoo or in their natural environment.

Blooming-amico

  1. Sound recognition: Although we already mentioned animals, classifying birds and amphibians are an excellent example to explain that not only images could serve as input data; audio signals are used to classify species or detect endangered species in a forest. Currently, sound classification algorithms allow audio to be classified according to the species or even to detect any human intervention, such as vehicles and chainsaws in protected areas or national parks.

birds-pana

  1. Sentiment analysis: Machine learning classification can analyze social media posts and determine if they are positive, negative, or neutral in sentiment. In addition to tabular data, images, videos, and sounds, some classification problems involve text. These problems include classifying short sentences, paragraphs, posts, comments, or reviews according to the author's sentiment or objective, generally as: positive, negative, neutral, or more precisely: happy, sad, angry, or depressed. Text classification is a unique tool that allows us to analyze our clients or users.

sentiment-analysis-amico

Remember:

  • Classification problems predict categorical values, also known as classes or categories.
  • A classifier also finds the relationship between the features (independent variables) and the label (dependent variable).
  • Once the algorithm has learned all the patterns between features and labels, it can classify any new sample, even if it hasn't seen it before.

Classification in Real Life

  1. [Kaggle] Heart Attack Analysis & Prediction Dataset
  2. [Kaggle] SMS Spam Collection Dataset
  3. [Kaggle] Flower Recognition
  4. [Kaggle] Chest X-Ray Images (Pneumonia)
  5. [Kaggle] Birds' Songs Numeric Dataset or [Kaggle] British Birdsong Dataset
  6. [Kaggle] Twitter US Airline Sentiment

If you want to learn more about Classification you should check out the following resources:

Textbooks

Chapter 3 (Classification): In this chapter, you will learn how to build a classifier from scratch—solving MNIST dataset and implementing two classifiers: Stochastic gradient descent, Stochastic Gradient Descent, and Random Forest.

Chapter 4 (Classification): In this chapter, we study approaches for predicting qualitative responses, a process that is known as classification. Predicting a qualitative response for an observation can be referred to as classifying that observation, since it involves assigning the observation to a category, or class.

Videos



Podcasts

I speak w/ Maroof Farooq, an AI engineer at Nvidia. We walk through how to build an AI application from scratch using the fictitious example of the Seefood app from the HBO television series, Silicon Valley. [Note: Maroof's views are his own, and not necessarily that of his employer.] We learn about image classification, acquiring a dataset, and how to train the AI. Join us as we build a super-impressive AI that can recognize hot dogs of all shapes and sizes. Learn what it takes to build an AI that takes it to the next level: recognizing foods of all kinds. Maybe even pizza. Join us as we begin our deep dive into the world of AI, starting with the humble hot dog. Today: Shazam for food. Tomorrow: Judgement Day. Buckle up folks. It's going to be a wild ride.

On March 17th, we had the pleasure of hosting of Belmont University at the Study Center for a talk titled “Curves and Categories: Machine Learning, AI, and the Nature of Classification.” Dr. Hawley is a Professor of Physics at Belmont, and his research interests include machine learning, neural networks, and the ethics of A.I. He joined us to explore the fascinating and complex nature of classification and what it reveals about intelligence, human and machine. “Machine learning classification techniques,” Dr. Hawley explained, are increasingly applied to fields as diverse as biology, astronomy, the humanities, law, medicine, the entertainment industry, criminal justice, library science, aesthetics, robotics, and more, in an effort to automate human decision-making on massive scales. The problematic socio-political ramifications of this enterprise are becoming increasingly evident, and merit a closer examination of the philosophies and methods of classification from their origins in antiquity up to present large-scale A.I. systems. During the talk, Dr. Hawley made extensive use of slides and you can while you listen. You can also follow Dr. Hawley on Twitter.

Prev: Regression
Next: Forecasting