Ask the Expert

SAS expertise delivered to your desktop -- on-demand and free!
BookmarkSubscribeRSS Feed

Tips for Statistical Communication and Visualization Q&A, Slides, and On-Demand Recording

Started ‎11-16-2021 by
Modified ‎12-03-2021 by
Views 6,103

Watch this Ask the Expert session to learn how to improve your data visualization practices for better communication to your end users. We’ll use a case study from the pandemic to illustrate. 

 

Watch the webinar

 

You will learn:

  • Best practices in data visualization that improve communication of statistical ideas.
  • How to read a graph of cumulative frequencies.
  • The pros and cons of using a logarithmic axis on a graph.
  • How catchphrases like “flatten the curve” can hamper statistical communication.
  • How risk can be quantified with simple probability models.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Q&A

Will you talk about the smokestack curves you briefly showed?

They are sometimes called smokestack plots. They are plots of cumulative curves for different countries or for different states. If you read my blog article about them, you will learn how to create them in SAS and how to use them.

 

On the random events from the geometric distribution slide, how do you do that in SAS? Is there a step-by-step coding example for that particular simulation?

If you go to my blog article on this topic, you will see the SAS code and a discussion about how to create those plots and how to do a simulation from the geometric distribution.

 

Given the weekend effect, isn't it better to always plot a moving average instead of the raw daily results?

Yes and no. I definitely think that you should overlay some sort of a moving average on the graph so that it smooths out the weekend effect. However, it is also valuable to see the raw daily results. If you're using the average of the last seven days to smooth the data, you are using a “lagging indicator.” The daily data will start to increase or decrease before the smoother does, so I prefer to see both the data and a smoother. The raw data can provide insight into trends three or four days before the smoother does.

 

When case counts are increasing, how do you distinguish whether it is due to increased testing, or increased % of positive tests?

I am showing the raw data, which is based on cases that are reported when someone goes to their doctor or urgent care center and gets a test that comes back positive. So if someone stays at home and never gets tested, that data does not appear in the publicly available data. There are ways to statistically adjust the displays to account for the amount of testing, but I did not do that.

 

Would you plot a weekly count to smooth out the data and how would you plot it, bar vs line graph?

In this presentation, I use bar graphs or needle plots for the raw data and overlay a line plot to indicate the smoother. 

 

Can you briefly describe your steps when developing a process to "test" and see the likelihood of a scenario? For example, deciding to test the statements on 10-fold or 100-fold pooled testing.

Please see my blog article on pool testing

 

There is an additional issue that is very difficult to communicate: missing data imputation. People tend to think that one makes up the data. How would you communicate imputation in a manner that it is perceived as a legitimate procedure?

I didn't do any COVID-related blog posts about missing data imputation, but you can read previous articles about missing value imputation. What I showed is based on the data that came from CDC or local health departments. Missing value imputation was not part of this data visualization. For communicating with the public, showing the data we have(even if incomplete) is better than imputing data, which requires using statistical modeling.  An exception is if you are trying to communicate the situation in a country that does not have good data collection practices or that is intentionally suppressing the number of COVID cases to make itself look better. For those cases, you would need to explain why the actual number of cases is higher than the reported cases, and how you came up with the estimates.

 

On a scale of 0 to 10, with: 10 being very useful and strict public benefit, 5 being neutral, and 0 being very deceptive and destructive to the public, how do you think that statistics were presented to laypeople in America?

Maybe a 5 or 6? As I mentioned while talking about flattening the curve, people can spin the data to support opinions that they already have. I think that we as statisticians need to present the data as clearly and simply as possible so that the public is less susceptible to attempts to spin the data for political purposes. When the data are clearly presented, it reduces the likelihood that people will misinterpret the data. I think people saw many graphs, but they didn't always understand the implications of the graphs that they saw.

 

Does the use of the terminology random event imply a no cause and effect scenario or otherwise? How do you account for that which leads to the random event in your graphs?

There is a cause-and-effect relationship, but it is related to the probability of the event. Remember those arrows that I drew on the plot of risks for different scenarios? I mentioned that if you go down the plot, you're reducing your risk because you are choosing trials that have lower probability that the event will occur. That is a cause and effect relationship: you choose a trial with a lower risk and your probability is decreased. But because the events are random, you might get unlucky. An event can occur even for a low-probability trial.

 

You showed pros and cons of using a log axis. For showing COVID-19 cases, should they be used or not?

Both scales are useful. I would use a log axis when speaking to a professional, scientifically literate, audience. I think for communication to the general public that a linear scale is often better. However, the linear scale becomes problematic if you want to look back to see what happened in early 2020. The number of cases back then were tiny compared to the numbers today, so a linear scale makes it difficult to see small numbers and large numbers on the same graph. I think that using a linear scale is more understandable when you're communicating with non-scientists or people that haven't studied math since high school.

 

Do you think blogging rather than appearing on TV stations to explain these graphs is more effective in educating the general public about COVID 19 cases?

I suspect TV has a wider audience than a blog. I wasn’t invited to appear on TV, so I used the medium that I had access to. I think many state and federal officials did a wonderful job explaining the statistics. In the state of North Carolina, we were fortunate to have Dr. Mandy Cohen as the Secretary of Health and Human Services. Her clear and unambiguous press briefings during the pandemic led to her being named Tar Heel of the Year in 2020 by the Raleigh News and Observer.

 

Recommended Resources

How to read a cumulative frequency graph

Estimates of doubling time for exponential growth

What does ‘flatten the curve’ mean? To which curve does it apply?

On reducing the spread of coronavirus

Pool testing: The math behind combining medical tests

SAS Learning Conference

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow-up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

 

Version history
Last update:
‎12-03-2021 03:33 PM
Updated by:
Contributors

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Article Tags