Data Visualization and Exploration

Deploying content

Final (“Examples”)

Check out some sample infographics.

Do not be intimidated by the sophisticated artistry.

Also note that there are a lot of pie charts1 and cleverly emphasized numbers.

Final (Requirements)

You will all submit:

  1. a .qmd file for your complete analysis
  2. rendered raw output (preferably a .pdf printed from an .html)
  3. a one-page infographic containing:
    • a title
    • 2-3 high-quality visualizations
    • a summary sentence or a few phrases to help with interpretation
    • possibly .csv files or links to data in the .qmd file
  • Graduate students will submit a required one-page typed reflection (see assignment sheet).

This will be submitted in a single zipped directory to D2L with a directory name of the form last-first-final.

Final (Presentations)

Two-minute presentations will be given during the final period assigned to our class.

Upload a separate 3-5 minute video presentation to D2L in .mp4 or .mov (these can be Zoom recordings or phone recordings).

  • You can give me more detail about anything along the way that you’d like to share (e.g., things that worked easily, or not at all, graphs you abandoned).
  • Show me how you spent your time.
  • You can share the infographic but also the “transcript” from your .qmd file.

Tools

Not strictly “Data Visualization and Exploration”, but some of what goes in to it and some of where it goes after.

  • GitHub is a collaborative tool.
  • Websites make results visible.

Not everything is a dashboard

You may be interested in sharing “content” so others can view it.

  • interactive (i.e., “Shiny”) content should be deployed to a Shiny server
  • static content (e.g., blogs or websites, graphs or data visualizations, non-interactive “dashboard” pages) can be deployed to any web server, GitHub makes this free1 and relatively easy2

GitHub

GitHub is a website and interface to Git.

Git is a “version control system”, it (among many other things)

  • saves intentional backup versions of projects,
  • allows users to compare differences between versions,
  • allows a user to restore a previous version as a backup.

It is an alternative to "Final_paper_draft_final_final_done (copy).pdf".

Unlike typical cloud storage (e.g., Dropbox, Google Drive, OneDrive), which

  • save backup versions of files,
  • allow users to restore specific file backups, but not entire projects.

A big difference is that cloud software runs in the background, git requires interaction.

Typical experiences

You work on something in your office. Then you,

Dropbox

  • save your work regularly1
  • go home.
  • decide to check your work.
  • open your Dropbox folder (or page).
  • see that your work is there.
  • do something new, save it.
  • go back to work the next day.
  • changes from night before are there.

GitHub

  • “commit” and “push” those changes to GitHub at the end of the day.
  • go home.
  • decide to check your work.
  • “Pull” changes from GitHub to computer.
  • see that your work is there.
  • do something new, then “commit” and “push” changes to GitHub.
  • go back to work the next day.
  • pull changes from night before and get back to work.

GitHub as a collaborative platform

Allows for collaboration and user-contributed suggestions.

For example, suppose you were interested in an R package I was developing (but I was afraid you might break it), you might

  • create a “fork”, your own copy for tinkering,
  • do something great with it and introduce a new feature,
  • share it back with me and request that I accept your work.

Now you are a contributor. Maybe I invite you to help manage and give you more trust and responsibility.

GitHub as a web server

While built for hosting versioned copies of source code, GitHub now

  • hosts repositories of data like datasets,
  • hosts websites, like our course page, as well as
  • simplifying certain tasks like editing, publishing, and collaborating.

It changes often. New, useful features are regularly added, along with their documentation.

Exploring this fully is a class, task, or hobby of its own.

Voluntary activities

Everything that follows (for a while at least) is voluntary, but recommended.

Some of these steps are tedious to figure out on your own, though documentation and support continues to improve.

If you are interested, feel free to work along. If not, feel free to use class time to work on your final project.

GitHub website and server

If you care to, create an account at GitHub.

Think carefully about your username

  • this could be the basis of your website address. (i.e., <username>.github,io), but
  • you can change this later - as long as the desired name is still available.

Connecting your1 computer

The simplest path seems to be described here.

There is quite a bit of effort to get started (hence trying it together with the support of others), but after that, things generally work smoothly.

Quarto

Visit quarto.org (Getting started) and install Quarto CLI (Command Line Interface).

After this we will need to use the “Terminal” program (either “Terminal” in Mac, the “Command Prompt” in Windows, or “Terminal” window in RStudio). For simplicity let’s try the RStudio terminal window. A few commands are useful.

Command Use/Notes
pwd Present Working Directory (shows where you are on the computer)
cd Change Directory (to the “home” directory)
cd .. Change Directory (to one level “up”)
cd folder/subfolder Change Directory (to a folder called “folder” and a named “subfolder”)
ls List (list directory contents)

Publishing with quarto

Visit quarto.org (GitHub Pages).

Key steps:

  • Edit _quarto.yml file
  • render and push to GitHub (now that it has been connected)
  • publish

These steps are done back and forth between your Terminal window and the corresponding GitHub page in your browser.

Back to DVE

Other charts

You may be familiar with pie charts or donut charts.

par(mar = rep(0, 4))
pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
names(pie.sales) <- c("Blueberry", "Cherry",
    "Apple", "Boston Cream", "Other", "Vanilla Cream")
pie(pie.sales)

par(mar = rep(0, 4))
pie(sort(pie.sales))
Figure 2
par(mar = rep(0, 4))
donut(sort(pie.sales))

If pie charts are bad, donuts are worse.

Let’s have waffles.

Waffle charts

These can be more easily implemented within ggplot using the waffle package.

if(!"waffle"%in%rownames(installed.packages()))install.packages("waffle")
library(waffle)

And we can make a small dataset to test this out.

pie.sales <- data.frame(
  type = factor(c("Blueberry", "Cherry", "Apple", "Boston Cream", "Other", "Vanilla Cream")), 
  sales = c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
  )

In reality this data might be formed as a summary from a larger dataset, but the data.frame() command works nicely. Unlike when our numeric data is sometimes read in as "0.12", numbers stay numbers.

Waffle charts

This won’t run - ggplot must be tricked into making pie charts.

ggplot(pie.sales, aes(fill = type, values = sales)) + geom_pie()

Changing _pie() to _waffle() is close but not quite. Multiply values by 100 to scale.

ggplot(pie.sales, aes(fill = type, values = 100*sales)) + geom_waffle()
Figure 3

You might first want to order from increasing to decreasing (and order the legend accordingly).

Dressing your waffles

Thanks can get surprisingly silly after this, but there’s nothing wrong with that.

Bored with boxes? You can use additional special packages to turn those bland rectangles into pie, fruit, or other suitablly shaped and colored icons.

  • The main point is that the graph content is correct and honest.
  • Some amount of decoration might make your message more memorable.

More specialty charts

Sometimes a chart type is expected by experts in the field.

This is a complicated mix of a network and collection of pie charts.

Could you imagine, or design, some other way to show how groups are connected?