Trie Data Structure

Kevin Kao
smucs
Published in
6 min readApr 3, 2021

--

Why Should You Care?

First of all, you should always care about data structures since they’re the lifeblood of computer science. Without them, we wouldn’t be able to have modern technology and incredible tools such as Structured Query Languages. Besides this, tries are incredibly valuable for efficiency purposes in a large scale of words with common prefixes since we won’t have to reallocated space for the prefix every time, and instead use the prefix only one time and further branch from there.

Pros: Efficient, fast, beautiful, simple to implement for lowercase values. Time complexity is O(M), where m is the length of the key.

Cons: The more kinds of characters (punctuation, numbers, capital letters, etc) makes the trie less efficient. The time complexity for insertion can be O(26*k) = O(k) where k would be the average length of each word.

A brief introduction and history on the trie data structure:

The trie data structure is a type of tree in which each node stores a single char. Each parent connects to their children in such they the connections may create words by linking one character to another. For example, the word “Fontenot” will first start with the parent node “F” and it will have a child of the char “o,” and so on. However, the power of the trie lies in the fact that the English language is very heavily based on roots and can expand in different ways. Now that we have the path of “Fontenot,” we could add the word “Fondren” and we would follow the first three nodes as they are the same and branch off where there is a difference. This is incredibly valuable for efficiency since we do not need to create a whole new word and can just store the path from each character. This tool becomes fully realized with common prefixes such as -pre, -re, -dis, -un, etc. as there are thousands of words with the same prefixes and thus reduces the redundancy of allocating space for the prefixes every time we want to add a word into a library of some sort.

The concept of a trie was discovered by Axel Thue in 1912, and introduced into the world of computer science by Rene de la Briandai in 1959. The term “trie” was officially coined by Edward Fredkin in 1960 (pronounced “tree”) after the middle syllable of “retrieval.”

The English language is very reliant on prefixes and suffixes, and as such the trie takes advantage of the prefix-dependent words we use. For example, we may have the following words in our dictionary:

  1. misuse
  2. misunderstood
  3. mistake
  4. misperceive
  5. misbehave

All of these words (and many more) have the prefix “mis.” We may use this to our advantage to find words quickly and efficiently. Whenever a word validation problem may come up, it’s a great chance to consider whether or not a trie could be to your advantage!

One factor about tries that makes it more valuable than just a binary tree storing characters is that we can implement save states. In the previous example, we know we are looking for words with the prefix “mis.” It would be inefficient and quite a waste to continuously search through the first three letters since that’s something we know should be consistent and we are looking for differences at a greater depth level. We could have a save state at the third character so that we are only making meaningful searches as opposed to going through the root every single time we are looking for a word.

Below is a simple example of how a trie would look.

A sample code is provided as a GitHub repository (Instructions for how to use the repository will be in the ReadMe.md file attached alongside the repository). In this section, we will walk through how the code works.

High-Level Overview:

The code is written such that an insertion call is made to every word. The insertion function takes in two parameters, a struct of a trie node and the word itself. Every word must be inserted with the root as the first parameter of the insertion function. For example, an insertion of the word “hello” would be as follows:

insert(root, “hello”); //Assuming that a struct node is called “root”

The trie is mainly used for boolean testing of whether or not a word exists in the trie. The format for using the search function is as follows:

search(root, “word”)? ____ if true : ____ if false;

Let’s walk through an example together. In the following example, we will insert the word “test” into the trie, then we will make a condition to output to the console whether or not we can find the word in the trie.

struct TrieNode *root = getNode(); //Creates an empty struct

insert(root, “test”);

search(root, “test”)? cout << “Test found!” : cout << “Test not found.”;

Deeper Dive In The Functionality:

The functionality of a trie is based on these aspects:

  1. Struct
  2. getNode function
  3. Insert function //Takes TrieNode pointer and string
  4. Search function //Takes TrieNode pointer and string
  5. Delete function //Takes TrieNode pointer and string

The struct is based on a recursive call to itself for its children, with enough allocated spaced for every letter of the alphabet. An important note about the design of this trie is that it only supports lowercase letters: punctuation and capital letters may be implemented but will take more space. There is the possibility for an abstract use of a trie for pattern recognition of punctuation and symbols. The other part of the struct is a boolean that checks if the word has reached its end.

The getNode function makes a new trie node struct and sets the boolean to be false and we set every possible position in the children to be NULL. The purpose of the getNode function is simply to return a new trie node (the purpose of setting everything to NULL is simply for initialization purposes).

The insert function creates another trie node that is used to crawl through the string we want to insert. We take in a pointer to a TrieNode and the key we want. From here, we loop through the length of the string we want to add into the trie. We create an index for each character by subtracting the ASCII value of the letter we want to insert from the letter ‘a.’ Now, check to see if the letter is present and if not we insert it into the trie. However, if the key is the prefix of the trie node, then we just mark the leaf node for efficiency.

The search function is just a boolean that returns the result of whether or not the word is in our trie. We do the same thing as the insert where we create a crawling node starting at the root but now we are looping through the word we are looking for. We do the same thing, subtracting each letter’s ASCII value from the ASCII value of ‘a.’ If at any point we do not have a match, we know that the word does not exist in our trie and we may return false. If we make it through the entire length of the word successfully then the entire word is found and we may return true (in the example we return that the crawling node is NOT null and that we are at the end of the word, essentially a true).

The delete function is slightly more complicated. First, we just check if the case is that the tree is empty. If so, we just return NULL. We recursively call the remove function to remove the index of the last character. If we are on the last character of the key being processed, then we check if the root isEndOfWord, and if so we have to set it to false and check if it is not a prefix for any other words (if so we would have to keep it). If it is not a prefix to any other word then we may delete the root using the C++ keyword “delete” and set the root to NULL. If the root doesn’t have any children, its only child was deleted, and it is not the end of another word then we would set isEmpty to true and set the root to NULL.

For the sample code mentioned, please look to the following GitHub repository: https://github.com/KevinKaoSMU/TrieExample

To learn more about the trie data structure, refer to the following as they are great resources to learn more:

  1. https://www.geeksforgeeks.org/trie-insert-and-search/
  2. https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014
  3. https://www.toptal.com/java/the-trie-a-neglected-data-structure
  4. https://www.youtube.com/watch?v=zIjfhVPRZCg&ab_channel=HackerRank
  5. https://www.youtube.com/watch?v=AXjmTQ8LEoI&ab_channel=TusharRoy-CodingMadeSimple

Crediting

Huge thank you to Dr. Fontenot of the Southern Methodist University, the Lyle School of Engineering, and the Southern Methodist University.

--

--

Kevin Kao
smucs

Undergraduate Computer Science student at the Southern Methodist University (Class of 2022).