Guest Author: Michalina Malysz
“Like you use sentences to tell a person a story; you use algorithms to tell a story to a computer” (Rudder 2013).
In today’s day and age, we have the world at our fingertips. The internet has made many things easier, including dating, allowing us to interact and connect with a plethora of new people–even those that were deemed unreachable just fifteen minutes beforehand.
Inside OKCupid: The math behind online dating talks about the math formula that is used to match people with others on the website OKCupid, the number one website behind online dating. Christian Rudder, one of the founders of OKCupid, examines how an algorithm can be used to link two people and to examine their compatibility based on a series of questions. As they answer more questions with similar answers, their compatibility increases.
You may be asking yourself how we explain the components of human attraction in a way that a computer can understand it. Well, the number one component is research data. OKCupid collects data by asking users to answer questions: these questions can range from minuscule subjects like taste in movies or songs to major topics like religion or how many kids the other person desires.
Many would think these questions were based on matching people by their likes; it does often happen that people answer questions with opposite responses. When two people disagree on a question asked, the next smartest move would be to collect data that would compare answers against the answers of the ideal partner and to add even more dimension to this data (such as including a level of importance). For example- What role do the certain question(s) play in the subject’s life? What level of relevancy are they? In order to calculate compatibility, the computer must find a way to compare the answer to each question, the ideal partner’s answer to each question and the level of importance of the question against that of someone else’s answers. The way that this is done is by using a weighted scale for each level of importance as seen below:
Level of Importance Point Value
A Little Important 1
Somewhat Important 10
Very Important 50
You may be asking yourself ‘How is this computed?: Let’s say you are person A and the person the computer is trying to match you with is person B. The overall question would be: How much did person B’s answers satisfy you? The answer is set up as a fraction. The denominator is the total number of points that you allocated for the importance of what you would like. The numerator is the total number of points that person B’s answers received. Points are given depending on the other person’s response to what you were looking for. The number of points is based on what level of importance you designated to that question.
This is done for each question; the fractions are then added up and turned into percentages. The final percentage is called your percent satisfactory – how happy you would be with person B based on how you answered the questions. Step two is done similarly, except, the question to answer is how much did your answers satisfy person B. So after doing the computation we are a left with a percent satisfactory of person B.
The overall algorithm that OKCupid uses is to take the n-root of the product of person A’s percent satisfaction and person B’s percent satisfaction. This is a mathematical way of expressing how happy you would be with each other based on how you answered the questions for the computer. Why use this complex algorithm of multiplication and square-rooting when you can just take the average of the two scores? Well, a geometric mean, which is “a type of mean or average which indicates the central tendency or typical value of a set of numbers” (Rudder, 2013), is ideal for this situation because it is great for sets of values with wide ranges and is great at comparing values that represent very different properties, such as your taste in literature and your plans for the future and even whether or not you believe in God (best of all, the algorithm can still be useful even when there is a very small set of data). It uses margin of error, which is “a statistic expressing the amount of random sampling error in a surveys results” (Rudder, 2013), to give person A the most confidence in the match process. It always shows you the lowest match percentage possible because they want person A and person B to answer more questions to increase the confidence of the match. For example, if person A and B only had answered two of the same questions margin of error for that sample size will be 50%. This means that the highest possible match percentage is 50%. Below I have included a table that shows how many of the same questions (size of s) must be answered by 2 people in order to get a .001 margin of error or a 99.99% match.
Now that we know how the computer comes up with this algorithm, it makes you wonder how do these match percentages affect the odds of person A sending one or more messages to person B. It turns out that people at OKCupid had been interested in this question as well and had messed with some of the matches in the name of science. It turns out that the percent match actually does have an effect on the likelihood of a message being sent and the odds of a single message turning into a conversation. For example, if person A was told that they were only a 30% match with person B (and they were only a 30% match), then there’s a 14.2 % chance that a single message would be sent and about a 10% chance of a single message turning into a conversation of four or more messages. However if person A was told that they are 90% match (even if they are only a 30% match), then the odds of sending one message is 16.9% and the odds that the one message turns into exchanging 4 or more is 17% .
I believe that the future of online dating is very broad and exciting. However I have some concerns about the algorithm and that it relies heavily on a person’s honesty and self-assessment. If I was to further analyze this topic I would look into how the length of the first message affects the response rates. Also, how it affects the odds that the conversation will continue for four or more messages and whether those messages would the same length or longer/shorter than the initial message sent. The extent of the questions that have yet to be asked about this particular set of data and the idea of online dating/ matching with people who are possibly oceans away are enormous; however, the data will linger on the Internet for many years to come and I’m sure will analyzed hundreds of times more to answer many many more questions.
Hill, K. (2014, July 28.). OKCupid Lied To Users About Their Compatibility As An Experiment. http://www.forbes.com/sites/kashmirhill/2014/07/28/okcupid-experiment-compatibility-deception/#4cbde4745eb1
Match Percentage. (n.d.). Retrieved April 26, 2016, from https://www.okcupid.com/help/match-percentages
Rudder, C. (2013, February 13). Inside OKCupid: The math of online dating. [Video file]. https://www.youtube.com/watch?v=m9PiPlRuy6E
Rudder, C. (2014). Dataclysm: Who We Are*. New York: Crown Publishers.
Figure 1. Margin of error vs. highest possible match. From “Match Percentage,” https://www.okcupid.com/help/match-percentages. Copyright by OkCupid. Reprinted with permission.
Figure 2. Odds of sending one and/or more messages from 30% match. From “OkCupid Lied To Users About Their Compatibility As An Experiment,” by Kashmir Hill, 2014, http://www.forbes.com/sites/kashmirhill/2014/07/28/okcupid-experiment-compatibility-deception/2/#2f78a64f5eb1. Copyright  by Forbes. Reprinted with permission.
Figure 3. Odds of a single message turning into a conversation based on match percent. From “OkCupid Lied To Users About Their Compatibility As An Experiment,” by Kashmir Hill, 2014, http://www.forbes.com/sites/kashmirhill/2014/07/28/okcupid-experiment-compatibility-deception/2/#2f78a64f5eb1. Copyright  by Forbes. Reprinted with permission.