Tuesday, 26 February 2013

Graphically displaying name frequencies

Well here goes...post number one of my new blog on Auld Genealogy...

Have you ever wondered how often names get reused in your family trees. Wouldn't it be also great if we could then display this information in a graphic form so that our readers could get a visual snapshot of the names in our tree...enter the word cloud.

Word Clouds

The basic premise of word clouds (also known as tag clouds) is that greater prominence is given to words that appear more frequently in the source text, hence words used often will be displayed as large text whilst seldom used words will be displayed as small text.

To see this in action, there is a great online tool called Wordle that allows you to create a word cloud from text that you have supplied. The tool also gives you full control over colours, fonts and layouts. To see the tool in action I created a little cloud based on all the Australian born individuals with an Auld surname. Here is the result:

Wordle of the authors Australian AULD family

The one thing that I had to tweak whilst creating the above word cloud was to reduce the frequency of the surname Auld to be the same as the highest frequency first name. Due to the frequency of the surname, all word clouds created without the modification showed one massive word surrounded by extremely small barely readable words. I think the balance of names in the end results give a good indication of names found in the family tree.

Further reading and resources


  1. It only took me a week to find you, Jonathan. Congratulations on your entry to the blogisphere - I'll be watching you.

  2. Tagxedo is another option for anyone interested in producing word clouds. It seems to operate in much the same way as Wordle but has the added benefit of being able to select an outer shape (eg a map of Australia) for the word cloud.