Hans Peter Luhn, the man behind Hash Map


Reading time: 30 minutes

Hans Peter Luhn was a computer scientist and a Textile engineer who is famously remembered as the inventor of Hash Map (a massively useful data structure) and has laid the foundations of text processing. He was in a time when computing system were being developed and initially, computers were seen to do simple lookups on numbers. He coined the term "business intelligence".

It was Luhn who extended computing systems for complicated text processing and opened up a new path for computing devices. He was associated with International Business Machine (IBM) as a researcher.

Hans Peter Luhn's major contributions include:

  • Hash Map
  • Luhn algorithm
  • Key Word in Context index
  • Selective dissemination of information (set of tools)
Hans Peter Luhn

Hans Peter Luhn, the Man behind Hash Map

Luhn is a lost hero. His contributions are widely used today but are hardly associated with him. He has been awarded over 80 patents but he did not received any well known awards as compared to other scientists of his caliber.

Life of Luhn

Luhn was born in Germany in 1st July 1896. He performed a wide range of work in his lifetime and was least likely to be a Computer Scientist in his initial years. The major events in his life are as follows:

  • After completing High School (around 1910), he moved to Switzerland and practiced printing business to join his family business
  • At the time, World War 1 started and Luhn joined as a communications officer in the German Army
  • After the War, he joined the textile market for which he moved to USA. During this time, he invented a thread counting gauge (Lunometer) which is still used (1920 to 1940)
  • In 1941, he joined IBM as a researcher.
  • Most of his research work in the field of computing is during his time at IBM while working on tasks he was assigned.

Everything started when IBM's clients James Perry and Malcolm Dyson asked for a solution to search for chemical compounds in coded form and this task was assigned to Hans Peter Luhn. This took place in 1947 and was about to change the computing landscape.

Key word in Context

Working on the problem he was assigned, we developed an algorithm which he called "Key word in Context". It could take in a large amount of text (nearly 5000 words) and quickly generated an index of words which can then be used for searching within the text.

This was a ground breaking discovery at that time (1950) as the field of text processing was still unknown and computer scientists focused only on numbers instead of text. Luhn

By 1960, it was used by major computing systems like in Chemical Abstracts Service, Biological Abstracts and the Institute for Scientific Information.

Luhn's Relation with Hash Map

Luhn's interest has been in text processing that is to handle large amounts of text and answer questions regarding them. This lead him to develop several ground breaking work in the field for IBM and over 70 patents.

Once he developed his Key word in Context (KWIC), he proposed the idea of mapping text to buckets to make the process of searching faster. He developed this as a way to quickly search phone numbers.

Consider a phone number as: 010-943-9617

There are over a Billion phone numbers and comparing each one will take a lot of time. His idea was to keep the numbers in small buckets where each bucket may have a few thousand numbers. With this, if we are able to map the original number to the bucket it should exist, we can just search the numbers in the bucket which will be faster as it will have significantly less data.

The idea to map a phone number to a bucket number was to add to consequtive numbers in the phone number to get the bucket number like:

010-943-9617 ---> 1 9 7 15 8 
---> 1 9 7 6 8 ---> 19768 (bucket number)

Hence, this was a phone number was assigned a bucket number. Over the years, researchers have improved over this method and this is known as hashing today.

You must learn about Hash Map in depth.

Luhn Algorithm

Luhn's Algorithm is an algorithm to verify various numbers such as social security numbers, credit card numbers, IMEI (mobile device) numbers and others. It has been widely used by major companies like Taco Bells, McDonalds and others.

It is a checksum algorithm that detects most of the common errors like single digit errors, reversing two adjacent digits, twin errors (like 22 changed to 55) and others. It is not supposed to be security purpose but works well in general.

The algorithm is as follows:

  • The number N transmitted is followed by a single digit C which is the checksum.
  • For the number N, each alternative digit is doubled and on doubling, if a number is greater than 9, then 9 is subtracted from it to make it a single digits.
  • Sum all the digits and the checksum digit as well
  • If the sum modulo 10 is 0, then the data is correct or else an error is detected.

This was one of the first algorithm of its kind and it truely great. Today, stronger algorithms exist in this field.

Selective dissemination of information (SDI)

This is an idea that is widely used today but was first raised by Luhn during his work at IBM. The idea is to distribute text information to a large number of users. The challenges involve which information to distribute and how to do so.

It is a set of tools like RSS feed, mailing lists, discussion forums, email notifications and much more and is a part of any online activity. It was during this work that Luhn coined the term "Business Intelligence".

Read works by Luhn

Papers by Luhn:

Luhn was a pioneer of text information processing and a fundamental figure to basics of handling text information. It is fact a great fact how he moved from the textile industry to computing field and made huge contributions to both fields.