BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Swoosh
Quartz | Level 8

I am looking to compare differences in the volume for an item (single group) over time before and after a period--use COVID as an example. I have 3 years of monthly data of volume from 2018 to 2020. 

 

Hypothesis: Did the volume significantly decline due to COVID?

 

Have: I collect monthly volume on the item over time. I want to compare each month in the COVID period to the avg./baseline period (pre-COVID: 2018-2019).

 

I believe this is a paired t-test; but I was told I could use an independent t-test assuming unequal variances Again, if I have 1 observation comparing against 24, would that work? Can I do a ttest or classic hypothesis test for this? Would I have to do a quasi-experimental approach for this? I'm a little unclear which is the appropriate approach for this.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@SteveDenham wrote:

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval.  I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.


After I wrote my reply, I thought of something like what you just described (quoted above). I probably thought of that after I wrote my reply because I originally didn't understand the question, and then later the light bulb went on.

 

In any case, another approach is to create a control chart, with month on the horizontal axis, where the limits are determined from the 24 months before COVID, and then each COVID month is plotted against those limits computed from the before-COVID time period. This seems to me like a more relevant and better approach than t-tests of 1 month against 24 months (although perhaps the results will be similar). I also like this becuase it is graphical, and can easily be presented to most people without much of an explanation. As we all know, graphical approaches are good because "a picture is worth a thousand numbers".

--
Paige Miller

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

If I am understanding you, this is not a paired t-test. It is an unpaired t-test.

 

I don't understand this line, and pending an explanation, I may change my answer. After I wrote this, the lightbulb came on in my head, and I do understand now.

 

Again, if I have 1 observation comparing against 24, would that work?

--
Paige Miller
SteveDenham
Jade | Level 19

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval.  I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.

 

Another option would be to test the mean against a specified null value, of which you would have 36-N.  So create a dataset that has only the pre-covid observations (called have in the code below, with the values by month stored in the variable 'month') and try:

 

proc ttest data=have h0 = m1;
var month;
run;

You would have to run a separate analysis for each month post-covid.  Here I did this with the value for the first month as m1.

 

SteveDenham

 

 

 

 

PaigeMiller
Diamond | Level 26

@SteveDenham wrote:

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval.  I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.


After I wrote my reply, I thought of something like what you just described (quoted above). I probably thought of that after I wrote my reply because I originally didn't understand the question, and then later the light bulb went on.

 

In any case, another approach is to create a control chart, with month on the horizontal axis, where the limits are determined from the 24 months before COVID, and then each COVID month is plotted against those limits computed from the before-COVID time period. This seems to me like a more relevant and better approach than t-tests of 1 month against 24 months (although perhaps the results will be similar). I also like this becuase it is graphical, and can easily be presented to most people without much of an explanation. As we all know, graphical approaches are good because "a picture is worth a thousand numbers".

--
Paige Miller
SteveDenham
Jade | Level 19

@PaigeMiller  - I REALLY like the control chart idea.  It gives a great visual approach that lets you know when something important happens.

 

SteveDenham

A_Swoosh
Quartz | Level 8
This is an interesting idea; how would I go about executing something like this in SAS?
PaigeMiller
Diamond | Level 26

@A_Swoosh wrote:
This is an interesting idea; how would I go about executing something like this in SAS?

PROC SHEWHART

 

If the volume you are talking about is approximately iid normally distributed during the pre-COVID time period, you can use the IRCHART statement in PROC SHEWHART.

 

Seasonality could make such a chart less useful, but you can also adapt SHEWHART charts to data that has seasonality.

--
Paige Miller
A_Swoosh
Quartz | Level 8

Sorry, I've never used PROC SHEWHART ever so forgive me if I ask a stupid question or two.

 

1. I have a dataset with:

 

ITEM|VOLUME|PRE|POST|MONTH|MONTH_NUM

 

Jan 2018 being the first month

 

Is it something as simple as:

   proc shewhart data=itemA;
      irchart volume*month_num;
   run;

 Capture.JPG

And if so, what's the interpretation above?

PaigeMiller
Diamond | Level 26

It's almost that simple, but not quite. 

 

First you need to compute the limits on the non-COVID data, and then plot the entire time period of non-COVID and COVID combined using the limits computed on just the non-COVID data. This requires two calls to PROC SHEWHART, the first does not create a plot, but just computes the limits of the non-COVID data, and the second applies the limits to the entire data set. Here is an example: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_shewhart_sect095.htm and

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_shewhart_sect096.htm

--
Paige Miller
A_Swoosh
Quartz | Level 8

Like so...? Again, I haven't done much graphing in SAS so my apologies for my simplistic questions. I appreciate your help.

 

/*subset dataset to itemA*/
data itemA;
       set all;
       if item='A';
run;

/*obtain non-COVID data*/
data itemA_pre;
    set itemA;
    if pre=1;
run;

/*obtain limits for non-COVID data*/
proc shewhart data=itemA_pre;
irchart volume*month_num /outhistory=itemA_info
									nochart;
run;

/*examine non-COVID chart*/
proc shewhart history=itemA_info;
   irchart volume*month_num;
run;

/*obtain limits for non-COVID data*/
proc shewhart data=itemA_pre;
   irchart volume*month_num / outlimits = itemA_limits
                         nochart;
run;

/*using SAS example to produce chart*/
options nogstyle;
goptions ftext='albany amt';
title 'Individual Measurements and Moving Range Control Charts';
proc shewhart data=itemA limits=itemA_limits;
   irchart volume*month_num / cframe = vigb
                         cconnect = yellow
                         coutfill = red
                         cinfill = vlib;
run;
options gstyle;

 

Capture.JPG

PaigeMiller
Diamond | Level 26

Looks good to me.

 

You have an upwards trend at the end of the non-COVID time period that, after a dip of a few months, continues upwards during COVID.

--
Paige Miller
A_Swoosh
Quartz | Level 8
Right. So, the dip represents right after COVID (~April 2020). Then you have that peak that falls outside the UCL in roughly December.

How would a layman interpret this chart? Is this showing that there is statistical difference in any month that falls outside the UCL, LCL? What about the secondary chart at the bottom?
PaigeMiller
Diamond | Level 26

I think I already said the "layman's interpretation" ... upward trend, continues during COVID.

 

The whole idea of statistical testing that you mentioned earlier assumes things have a constant mean and constant variance during the first time period (non-COVID), and the mean is not constant here, it has a clear upwards trend. So any hypothesis testing must take this into account. Any other representations of the data seems suspect to me.

 

The bottom chart indicates the absolute value of the change from month t-1 to month t. So by looking at this chart, we see no change in the change from month to month over this time period.

--
Paige Miller
A_Swoosh
Quartz | Level 8
I thought that hypothesis testing for this type of data would not be appropriate? Wouldn't I also be able to do a simple OLS regression with month dummies too?
PaigeMiller
Diamond | Level 26

@A_Swoosh wrote:
I thought that hypothesis testing for this type of data would not be appropriate? Wouldn't I also be able to do a simple OLS regression with month dummies too?

I don't understand any of this. Please explain further.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1451 views
  • 5 likes
  • 3 in conversation