Introduction and foundation
Many of the data sources we investigate will tangentially be related to one (or more) of these three somewhat personal or sensitive topics.
Our focus is on honest, helpful portrayal of data regardless of where it comes from.
“All graphs are charts, but not all charts are graphs.”
Charts are one of
Common graphs include
“There is an art to choosing the best visualization for data.” - @MsDrData (Lisa Neidert)
“There is no single statistical tool that is as powerful as a well-chosen graph.” - NSW Department of Health, July 2006
“The subject of graphical methods for data analysis and for data presentation needs a scientific foundation.” - Cleveland and McGill, 1984
“What combination of features produces a good graph? The answer from the literature is fairly clear, though not immediately helpful, and that is it depends.” - NSW Department of Health, July 2006
This is a neglected, underappreciated, or outright mistrusted step.
It is important for the
Not to be confused with “fishing” or “snooping”, especially if you are a consultant trying to learn about data you’ve been assigned.
This begins with exploration, but must lead to application of more formal statistical tools.
Scientific communication could be its own course.
We have to explain things to
Resources, examples, guidelines, and practices come from fields of
This means priorities and preferences expressed may depend greatly on the audience.
An infographic (i.e., infoviz, information graphic) could contain a collection of data visualizations along with a narrative and superfluous ornamentation.
A data visualization (i.e., dataviz) is a specific representation of some relationship. Any narrative is external to the visualization and ornamentation is often used sparingly.
Data visualizations are (or should be)
Infographics are (typically)
There are many, almost too many, tools available to us, including
For each, there are packages and plugins for special tasks or alternative implementations of a given task.
Developed as a hobby, research, or proprietary product, these could change daily.
There are a variety of resources available in a variety of formats.
We will aim for both.
We perceive information differently based on how it is presented.
A brief pause.
1.) Find two partners. Take out one cell phone and set it to Stopwatch mode.
2.) Take a packet of cards, letter side up.
3.) When ready flip a card, immediately start the timer, and press stop as soon as you find the blue dot.
4.) Open the spreadsheet to the Colors tab.
5.) Generate a personal code using your initials and birth day (of month). Enter data as follows.
User | Card Letter | Time |
---|---|---|
… | … | … |
sl22 | A | 0.63 |
… | … | … |
5.) Pause until everyone has contributed.
We just played a “game” that required you to perform a visual task. Data visualization implies an expectation of visual ability.
Be aware of the abilities of yourself, others, and how they may or may not differ.
Neither mathematics nor statistics are known for their accessibility to blind or low-vision users.
Figure 4 shows how the time to identify the target varies with changes to the complexity of the presentation.
In your favorite browser and search engine, search for the letter “R” (without quotes).
Returning to your browser, search for “RStudio”
What happens next depends on whether you have used these or other related programs in the past.
Now we look at a few “classics” (and their publication dates).
One of the first modern data visualizations.
What do you notice?
What do you notice?
What do you notice?
Note
The Areas of the blue, red, & black wedges are each measured from the centre as the common vertex. The blue wedges measured from the centre of the circle represent area for area the deaths from Preventable or Mitigable Lymotic diseases; the red wedges measured from the centre the deaths from wounds; & the black wedges measured from the centre the deaths from all other causes. The black line across the red triangle in Nov. 1854 marks the boundary of the deaths from all other causes during the month. In October 1854 & April 1855; the black area coincides with the red; in January & February 1855, the blue coincides with the black The entire areas may be compared by following the blue, the red & the black lines enclosing them.
Widely recognized as an all-time best data visualization …
… but, one any of us are not ever likely to recreate ourselves.
What do you notice?
Note
Figurative Map of the successive losses in men of the French Army in the Russian campaign 1812 ~ 1813
Drawn by M. Minard, Inspector General of Bridges and Roads (retired) Paris, November 20, 1869.
The numbers of men present are represented by the widths of the colored zones at a rate of one millimeter for every ten thousand men; they are further written across the zones. The red designates the men who enter Russia, the black those who leave it. - The information which has served to draw up the map has been extracted from the works of M.M. Thiers, de Ségur, de Fezensac, de Chambray and the unpublished diary of Jacob, the pharmacist of the Army since October 28th.
In order to better judge with the eye the diminution of the army, I have assumed that the troops of Prince Jérôme and of Marshal Davout, who had been detached at Minsk and Mogilev and have rejoined near Orsha and Vitebsk, had always marched with the army.
In short, C’est la Bérézina.
What do you notice?
Note
Chart prepared by Atlanta University students for the Negro Exhibit of the American Section at the Paris Exposition Universelle in 1900 to show the economic and social progress of African Americans since emancipation.
What do you notice?
Note
Chart prepared by Atlanta University students for the Negro Exhibit of the American Section at the Paris Exposition Universelle in 1900 to show the economic and social progress of African Americans since emancipation.
How does this compare to the examples we’ve just reviewed?
What do you notice?
Equal spacing between bars minimizes the passage of time.
A connected line graph showing lifetime lung cancer risk for Australian males and females (from NSW Department of Health, July 2006).
Prevalence of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) infection by injecting history, clients of needle and syringe programs, NSW 1995 to 1998 (from NSW Department of Health, July 2006).
Prevalence of human immunodeficiency virus (HIV) and hepatitis C virus (HCV) infection by injecting history, clients of needle and syringe programs, NSW 1995 to 1998 (from NSW Department of Health, July 2006).
Don’t lie.
Make it accessible.
Beyond this there are strong conventions - often related to our underlying ability to perceive visual information.
We will experiment with perception of color and record some pseudoanonymous data.
Click the color you feel best represents the sample
Open the shared spreadsheet (https://bit.ly/42kMq0O)
Using your assignd rows in the Colors tab, begin selecting colors and recording results. Enter data as suggested.
User | Color selected | Color alternate | Percent in agreement | “Controversy” score |
---|---|---|---|---|
… | … | … … | … | |
sl22 | red | orange | 85 | 0 |
… | … | … | … | … |
Next week we will use the color data to perform some visualization tasks.
In addition to color, there are a variety of other perceptual issues that affect the effectiveness of graphs.
We will now