header

What is the Defenestration Index?

When I applied to universities, I was given a truly incredible prompt by the University of Chicago. They are quite famous for their "out there" essay questions, and this year was no different. While, they had a number of options to choose from, the one I chose goes as follows:

In Homer’s Iliad, Helen had a “face that launched a thousand ships.” A millihelen, then, measures the beauty needed to launch one ship. The Sagan unit is used to denote any large quantity (in place of “billions and billions”). A New York Minute measures the period of time between a traffic light turning green and the cab behind you honking. Invent a new unit of measurement. How is it derived? How is it used? What are its equivalents?

That night, I was telling friends about some of my favourite words in English which really don't deserve to exist, and had the idea to create a score to quantify how I felt about the language. As a lover of both unreasonable scoring systems and languages, it was perfect.

The Scoring System

The Scoring System is split into 4 sections, all of which are then combined into a single number at the end.

Phomemic Humour

Simply put, this is a way of guessing how funny a word will sound to humans. To do this, I reached out to Professor Chris Westbury at the University of Alberta about a paper he wrote called "Telling the world's least funny jokes: On the quantification of humor as entropy". Being a wonderfully kind person and recognizing that I wanted to have some fun, he sent me the paper free of charge, along with the dataset they used, and an absolutely wonderful video of a monkey. With this in hand, along with letter frequency data acquired through Rachael Tatman's corpus of frequently used words, I finally had everything to create a metric for the presumed humour of a word. Professor Westbury's analysis on humour uses Shannon Entropy based on letter frequency, which, when plugging in our variables and adjusting for word length, gives us the equation: humour.png

Where p_i is the frequency of each letter (out of 1), and l is the length of the word

Word Utilization

Word utilization is the equations attempt to quantify how useful a word really is. Logically, as the words use cases decrease, its reasons to exist should also decrease and its absurdity rise. I used Google's NGram Viewer to analyze word use. In the equation, f is the set of all years where the word is used and how many times it appears, and y is the number of years it has appeared for. This adjusts the equation to average out its yearly use while it has been used at all, and helps it even the playing field for newer words. This gives us the equation: frequency.png

Word Ambiguity

If its hard to tell what a word means, that should be reflected in its absurdity, so this component is simply the number of definitions of a word as given by Princeton University's Wordnet.

When coming up with this score, I almost stopped at the first three components, but something didn't feel right. There was a component for the frequency of words, but it still didn't feel like there was any way to quantify a words true applicability; how well it meshed with the rest of the language. That is what this metric tries to solve. One of Wordnet's features is a collection of "hypernyms" and "hyponyms" of each word in their database. These are words that this root is derived from and words that derive from this root respectively. In effect, one can think of a word like a node on a uni-directional graph, where nodes that this word leads to are hyponyms, and nodes that lead to this word are hypernyms. If a word has a number of hyponyms, that means there are more cases where the meaning is useful, and therefore it should decrease the absurdity of the word. In this equation, there are 2 degrees of separate hyponyms we use, although another version of the score could do as many as wanted. This gives us the equation:

ambiguity.png

Why is This on my Portfolio

The Defenestration Index, while not a particularly technical project, is one of my proudest projects. It represents the way my mind works, going into rabbit holes and finding meaning where many others would never think to look. I like to think that wherever life takes me, I'll bring along this attitude to problems I come across, and never forget about what I love to do.

The Full Essay

As I mentioned at the start, this was originally a college application essay, and one which I remain very proud of.

Read The Essay