Data Visualization and Exploration

Introduction and foundation

Introduction

Politics, religion, and death

Many of the data sources we investigate will tangentially be related to one (or more) of these three somewhat personal or sensitive topics.

Our focus is on honest, helpful portrayal of data regardless of where it comes from.

Graphs and Charts

“All graphs are charts, but not all charts are graphs.”

Charts are one of

  • Maps
  • Tables
  • Diagrams
  • Graphs

Common graphs include

  • Scatter
  • Line
  • Box
  • Bar

“There is an art to choosing the best visualization for data.” - @MsDrData (Lisa Neidert)

The need for “Data Visualisation”?

“There is no single statistical tool that is as powerful as a well-chosen graph.” - NSW Department of Health, July 2006

  • Explore
  • Understand
  • Explain

“The subject of graphical methods for data analysis and for data presentation needs a scientific foundation.” - Cleveland and McGill, 1984

“What combination of features produces a good graph? The answer from the literature is fairly clear, though not immediately helpful, and that is it depends.” - NSW Department of Health, July 2006

Explore

This is a neglected, underappreciated, or outright mistrusted step.

It is important for the

  • assurance of data quality
  • understanding of the data itself
  • generation of new ideas

Not to be confused with “fishing” or “snooping”, especially if you are a consultant trying to learn about data you’ve been assigned.

This is a sandbox of ideas1. What type of

  • variation occurs within my variables?
  • covariation occurs between my variables?

Play, but take notes - document your work.

A decorative figure of a freshly raked zen garden.
Figure 1: Garden of ideas.

Understand

This begins with exploration, but must lead to application of more formal statistical tools.

  • Visualizations can generate ideas and help communicate ideas.
  • Formal analysis provides actionable precision.

Explain

Scientific communication could be its own course.

We have to explain things to

  • our peers, collaborators, and clients
  • the public
  • the government

Data visualization perspectives

Resources, examples, guidelines, and practices come from fields of

  • Design
  • Journalism
  • Statistics

This means priorities and preferences expressed may depend greatly on the audience.

Data visualization and infographics

An infographic (i.e., infoviz, information graphic) could contain a collection of data visualizations along with a narrative and superfluous ornamentation.

A data visualization (i.e., dataviz) is a specific representation of some relationship. Any narrative is external to the visualization and ornamentation is often used sparingly.

Data visualization and infographic

Data visualizations are (or should be)

  • standalone
  • objective
  • clear and accessible

Infographics are (typically)

  • narrative
  • subjective

Tools

There are many, almost too many, tools available to us, including

  • R/RStudio
  • Python
  • Proprietary (e.g., MS Excel, SAS, Tableau)

For each, there are packages and plugins for special tasks or alternative implementations of a given task.

Developed as a hobby, research, or proprietary product, these could change daily.

  • Check for updates regularly (and accept them).
  • Cite your sources.

Resources

There are a variety of resources available in a variety of formats.

  • Some teach tools.
  • Some teach principles.
  • Some teach both.

We will aim for both.

Activity

Human perception

We perceive information differently based on how it is presented.

A brief pause.

“preattentive pop-out”

1.) Find two partners. Take out one cell phone and set it to Stopwatch mode.

2.) Take a packet of cards, letter side up.

3.) When ready flip a card, immediately start the timer, and press stop as soon as you find the blue dot.

4.) Open the spreadsheet to the Colors tab.

5.) Generate a personal code using your initials and birth day (of month). Enter data as follows.

User Card Letter Time
sl22 A 0.63

5.) Pause until everyone has contributed.

Visualization

We just played a “game” that required you to perform a visual task. Data visualization implies an expectation of visual ability.

A *Where's Waldo?* gathering - people dressed in red and white stripes.
Figure 3: A “Where’s Waldo?” gathering.

Be aware of the abilities of yourself, others, and how they may or may not differ.

Neither mathematics nor statistics are known for their accessibility to blind or low-vision users.

Visualization of our own perception

Figure 4 shows how the time to identify the target varies with changes to the complexity of the presentation.

Figure 4: Time to observe blue dot.

A pause for software

Downloading R and RStudio

In your favorite browser and search engine, search for the letter “R” (without quotes).

  • download the R installer for your computer
  • follow the installation instructions
  • if you already have R, make sure it is up-to-date

Returning to your browser, search for “RStudio”

  • download the RStudio installer for your computer
  • follow the installation instructions
  • if you already have RStudio, make sure it is up-to-date

What happens next …

What happens next depends on whether you have used these or other related programs in the past.

  • If we have time at the end of class, we may break into groups and experiment.
  • Or we may do this in small groups outside of class (possibly via Zoom).
  • Or we may save this until the start of class next week.

A brief history

“Classics” in early data visualization

Now we look at a few “classics” (and their publication dates).

  • William Playfair on economics (from 1786)
  • Snow on cholera (from 1854)
  • Nightingale on battlefield Mortality (from 1858)
  • Minard on the invasion of Russia (from 1869)
  • DuBois on the Black American experience (from 1900)
  • Banana export chart (from 2005)

William Playfair’s “balance of trade”

One of the first modern data visualizations.

What do you notice?

John Snow’s “cholera map”

What do you notice?

Florence Nightingale’s “causes of mortality”

What do you notice?

“causes of mortality” (transcribed)

Note

The Areas of the blue, red, & black wedges are each measured from the centre as the common vertex. The blue wedges measured from the centre of the circle represent area for area the deaths from Preventable or Mitigable Lymotic diseases; the red wedges measured from the centre the deaths from wounds; & the black wedges measured from the centre the deaths from all other causes. The black line across the red triangle in Nov. 1854 marks the boundary of the deaths from all other causes during the month. In October 1854 & April 1855; the black area coincides with the red; in January & February 1855, the blue coincides with the black The entire areas may be compared by following the blue, the red & the black lines enclosing them.

Minard’s “Napoleon’s invasion”

Widely recognized as an all-time best data visualization …

… but, one any of us are not ever likely to recreate ourselves.

What do you notice?

Minard’s original text (translated)

Note

Figurative Map of the successive losses in men of the French Army in the Russian campaign 1812 ~ 1813

Drawn by M. Minard, Inspector General of Bridges and Roads (retired) Paris, November 20, 1869.

The numbers of men present are represented by the widths of the colored zones at a rate of one millimeter for every ten thousand men; they are further written across the zones. The red designates the men who enter Russia, the black those who leave it. - The information which has served to draw up the map has been extracted from the works of M.M. Thiers, de Ségur, de Fezensac, de Chambray and the unpublished diary of Jacob, the pharmacist of the Army since October 28th.

In order to better judge with the eye the diminution of the army, I have assumed that the troops of Prince Jérôme and of Marshal Davout, who had been detached at Minsk and Mogilev and have rejoined near Orsha and Vitebsk, had always marched with the army.

In short, C’est la Bérézina.

W.E.B. DuBois’ “Freemen and slaves”

What do you notice?

Note

Chart prepared by Atlanta University students for the Negro Exhibit of the American Section at the Paris Exposition Universelle in 1900 to show the economic and social progress of African Americans since emancipation.

W.E.B. DuBois’ “Conjugal condition”

What do you notice?

Note

Chart prepared by Atlanta University students for the Negro Exhibit of the American Section at the Paris Exposition Universelle in 1900 to show the economic and social progress of African Americans since emancipation.

A glimpse at banana trade

How does this compare to the examples we’ve just reviewed?

What do you notice?

What kinds of graphs have we just encountered?

  • Curve-difference plot (Playfair on economics (from 1786))
  • Dot map (Snow on cholera (from 1854))
  • Rose/Nightingale (Nightingale on battlefield Mortality (from 1858))
  • Sankey diagram (Minard on the invasion of Russia (from 1869))
  • Line/area and pyramid plot (DuBois on the Black American experience (from 1900))
  • 3D bar chart (Banana export chart (from 2005))

Modern case studies

A bar chart

A line graph

Vertical axis markings in red were originally shown as being equally spaced across the axis.

A grouped bar chart

Groups have been ordered consistently and time is showed chronologically.

A shaded line chart

  • What is your first reaction?
  • What do you wonder?
  • What could you do to improve the presentation?

Paired line graphs

A connected line graph showing lifetime lung cancer risk for Australian males and females (from NSW Department of Health, July 2006).

Paired line graphs

Prevalence of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) infection by injecting history, clients of needle and syringe programs, NSW 1995 to 1998 (from NSW Department of Health, July 2006).

Paired line graphs

Prevalence of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) infection by injecting history, clients of needle and syringe programs, NSW 1995 to 1998 (from NSW Department of Health, July 2006).

Going forward, what are the “rules”?

Don’t lie.

  • Proportional ink.
  • Honest axes.

Make it accessible.

Beyond this there are strong conventions - often related to our underlying ability to perceive visual information.

Colors

We will experiment with perception of color and record some pseudoanonymous data.

  1. Visit https://colorcontroversy.com/.

  2. Click the color you feel best represents the sample

  3. Open the shared spreadsheet (https://bit.ly/42kMq0O)

  4. Using your assignd rows in the Colors tab, begin selecting colors and recording results. Enter data as suggested.

User Color selected Color alternate Percent in agreement “Controversy” score
… …
sl22 red orange 85 0

Color data

Next week we will use the color data to perform some visualization tasks.

Other considerations

In addition to color, there are a variety of other perceptual issues that affect the effectiveness of graphs.

We will now

  • spend the semester making graphs, making graphs better, and making graphs more useful and honest.
  • and also look at situations where a table beats a graph of any kind as the means of visualization.