The 37% Rule

Christian Zuniga, PhD

Imagine you are apartment hunting and would like to find the best apartment for you. Although you can usually find rent prices and pictures online, you probably still want to check out each one in person. You are going to live there after all. You probably want to check out a few to gather information but do not want to have an extended search. Waiting too long can also be costly. How can you best select your new apartment? Figure 1 shows the search starts sequentially and ends after seeing some number of apartments. This could also…

Christian Zuniga, PhD

Figure 1. Example EM model used for mixture models [1]

The Expectation-Maximization (EM) algorithm is one of the main algorithms in machine learning for estimation of model parameters [2][3][4]. For example, it is used to estimate mixing coefficients, means, and covariances in mixture models as shown in Figure 1. Its objective is to maximize the likelihood p(X|θ) where X is a matrix of observed data and θ is a vector of model parameters. This is maximum likelihood estimation and in practice the log-likelihood ln p(X| θ ) is maximized. The model parameters that maximize this function are deemed to be the correct model parameters. …

A quick history and a practical introduction

Data mining is what it says: mining data. Although this frequently involves accessing data from databases, this is only one step. Data mining ultimately seeks to extract non-obvious patterns from data of potential value. In other words, data mining extracts information from data. The amount of data being generated and recorded has exploded in the past decades. Decreasing cost of digital storage and transmission allows gathering more and more data in multiple forms such as images, video, and text. However, the amount of useful information may be much smaller. Finding information in data can be seen as finding diamonds in…

Christian Zuniga, PhD

Friday October 9, 2020

Clustering is the task of grouping similar items together with the objective of understanding the relations among the items. For example, a company may discover its customers may be placed into groups based on common characteristics that go beyond the obvious ones such as age. These could then be offered more useful services. Clustering is a part of the knowledge discovery process that may reveal hidden patterns and new knowledge¹

If there were two characteristics or features, the items could be visualized as shown in Figure 1. As the figure shows, the items…

Christian Zuniga, PhD

Viruses are very, very small. Although their existence was suspected since the 19th century, they were too small to be seen [1]. Viruses are so small that taking a picture with an ordinary light microscope would not detect them. Figure 1 shows an image of a MERS Coronavirus taken with an electron microscope [2]. The diameter of the virus is on the order of 100 nm (or 0.0000001 m). For comparison, the width of a human hair is around 80 um so a virus is around 8000 times smaller.

Figure 1 EM Image of a virus

Why is it difficult for ordinary light to…

Christian Zuniga, PhD


Neural networks have in the past decade achieved high, even super-human accuracy, at many computer vision tasks such as image classification, object detection, and image segmentation. Although neural networks and the backpropagation training technique have been available since the 1980’s, recent advances in hardware like GPUs, architectures, algorithms, and availability of large high quality labeled data have allowed neural networks to reign supreme on these tasks. For example, in the ImageNet data set, a program has to correctly classify an input image from 1000 categories. In 2012, a breakthrough was made when the classification error was…

Christian Zuniga, PhD

Augmented Reality (or AR) is a system that enhances the real world with a seamless integration of computer-generated perceptual information [1]. Image projections are frequently used in AR to transform an image into the perspective of another image. This projection gives the appearance an object is in a scene when in reality it is not actually there. Figure 1 for example shows a clock projected onto the goal of a soccer field. The clock appears to stand in the area but is not there in reality. This is a modified example from the Perception Course in Coursera…

Christian Zuniga, PhD

Principal component analysis (PCA) is an unsupervised, linear technique for dimensionality reduction first developed by Pearson in 1901 [1],[2],[3]. It is widely used in many areas of data mining such as visualization, image processing and anomaly detection. It is based on the fact that data may have redundancies in its representation. Data refers to a collection of similar objects and their features. An object could be a house and the features the location, the number of bedrooms, the square footage, and any other characteristic that can be recorded of the house. In PCA analysis, redundancy in the…

Christian Zuniga

Christian Zuniga has worked as a lecturer in San Jose State University in data mining. He worked in OPC modeling and is interested in machine learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store