Statistical Procedures

A_Swoosh · Posted 06-01-2021 06:22 PM

I am looking to compare differences in the volume for an item (single group) over time before and after a period--use COVID as an example. I have 3 years of monthly data of volume from 2018 to 2020.

Hypothesis: Did the volume significantly decline due to COVID?

Have: I collect monthly volume on the item over time. I want to compare each month in the COVID period to the avg./baseline period (pre-COVID: 2018-2019).

I believe this is a paired t-test; but I was told I could use an independent t-test assuming unequal variances Again, if I have 1 observation comparing against 24, would that work? Can I do a ttest or classic hypothesis test for this? Would I have to do a quasi-experimental approach for this? I'm a little unclear which is the appropriate approach for this.

Thanks

PaigeMiller · Posted 06-02-2021 07:58 AM

@SteveDenham wrote:

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval. I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.

After I wrote my reply, I thought of something like what you just described (quoted above). I probably thought of that after I wrote my reply because I originally didn't understand the question, and then later the light bulb went on.

In any case, another approach is to create a control chart, with month on the horizontal axis, where the limits are determined from the 24 months before COVID, and then each COVID month is plotted against those limits computed from the before-COVID time period. This seems to me like a more relevant and better approach than t-tests of 1 month against 24 months (although perhaps the results will be similar). I also like this becuase it is graphical, and can easily be presented to most people without much of an explanation. As we all know, graphical approaches are good because "a picture is worth a thousand numbers".

--
Paige Miller

View solution in original post

PaigeMiller · Posted 06-02-2021 07:09 AM

If I am understanding you, this is not a paired t-test. It is an unpaired t-test.

~~I don't understand this line, and pending an explanation, I may change my answer.~~ After I wrote this, the lightbulb came on in my head, and I do understand now.

Again, if I have 1 observation comparing against 24, would that work?

--
Paige Miller

SteveDenham · Posted 06-02-2021 07:51 AM

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval. I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.

Another option would be to test the mean against a specified null value, of which you would have 36-N. So create a dataset that has only the pre-covid observations (called have in the code below, with the values by month stored in the variable 'month') and try:

proc ttest data=have h0 = m1;
var month;
run;

You would have to run a separate analysis for each month post-covid. Here I did this with the value for the first month as m1.

SteveDenham

PaigeMiller · Posted 06-02-2021 07:58 AM

@SteveDenham wrote:

You might consider something like creating a confidence interval on the mean of the N pre-covid values and looking to see if the single month values post-covid fall inside that confidence interval. I don't believe you will be able to do any two group based t tests, as the single values post covid have no estimate of variability.

After I wrote my reply, I thought of something like what you just described (quoted above). I probably thought of that after I wrote my reply because I originally didn't understand the question, and then later the light bulb went on.

In any case, another approach is to create a control chart, with month on the horizontal axis, where the limits are determined from the 24 months before COVID, and then each COVID month is plotted against those limits computed from the before-COVID time period. This seems to me like a more relevant and better approach than t-tests of 1 month against 24 months (although perhaps the results will be similar). I also like this becuase it is graphical, and can easily be presented to most people without much of an explanation. As we all know, graphical approaches are good because "a picture is worth a thousand numbers".

--
Paige Miller

SteveDenham · Posted 06-02-2021 10:59 AM

@PaigeMiller - I REALLY like the control chart idea. It gives a great visual approach that lets you know when something important happens.

SteveDenham

A_Swoosh · Posted 06-02-2021 11:49 AM

This is an interesting idea; how would I go about executing something like this in SAS?

PaigeMiller · Posted 06-02-2021 11:54 AM

@A_Swoosh wrote:
This is an interesting idea; how would I go about executing something like this in SAS?

PROC SHEWHART

If the volume you are talking about is approximately iid normally distributed during the pre-COVID time period, you can use the IRCHART statement in PROC SHEWHART.

Seasonality could make such a chart less useful, but you can also adapt SHEWHART charts to data that has seasonality.

--
Paige Miller

A_Swoosh · Posted 06-02-2021 12:23 PM

Sorry, I've never used PROC SHEWHART ever so forgive me if I ask a stupid question or two.

1. I have a dataset with:

Jan 2018 being the first month

Is it something as simple as:

   proc shewhart data=itemA;
      irchart volume*month_num;
   run;

And if so, what's the interpretation above?

PaigeMiller · Posted 06-02-2021 12:58 PM

It's almost that simple, but not quite.

First you need to compute the limits on the non-COVID data, and then plot the entire time period of non-COVID and COVID combined using the limits computed on just the non-COVID data. This requires two calls to PROC SHEWHART, the first does not create a plot, but just computes the limits of the non-COVID data, and the second applies the limits to the entire data set. Here is an example: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_shewhart_sect095.htm and

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_shewhart_sect096.htm

--
Paige Miller

A_Swoosh · Posted 06-02-2021 01:31 PM

Like so...? Again, I haven't done much graphing in SAS so my apologies for my simplistic questions. I appreciate your help.

/*subset dataset to itemA*/
data itemA;
       set all;
       if item='A';
run;

/*obtain non-COVID data*/
data itemA_pre;
    set itemA;
    if pre=1;
run;

/*obtain limits for non-COVID data*/
proc shewhart data=itemA_pre;
irchart volume*month_num /outhistory=itemA_info
									nochart;
run;

/*examine non-COVID chart*/
proc shewhart history=itemA_info;
   irchart volume*month_num;
run;

/*obtain limits for non-COVID data*/
proc shewhart data=itemA_pre;
   irchart volume*month_num / outlimits = itemA_limits
                         nochart;
run;

/*using SAS example to produce chart*/
options nogstyle;
goptions ftext='albany amt';
title 'Individual Measurements and Moving Range Control Charts';
proc shewhart data=itemA limits=itemA_limits;
   irchart volume*month_num / cframe = vigb
                         cconnect = yellow
                         coutfill = red
                         cinfill = vlib;
run;
options gstyle;

PaigeMiller · Posted 06-02-2021 01:39 PM

Looks good to me.

You have an upwards trend at the end of the non-COVID time period that, after a dip of a few months, continues upwards during COVID.

--
Paige Miller

A_Swoosh · Posted 06-02-2021 01:46 PM

Right. So, the dip represents right after COVID (~April 2020). Then you have that peak that falls outside the UCL in roughly December.

How would a layman interpret this chart? Is this showing that there is statistical difference in any month that falls outside the UCL, LCL? What about the secondary chart at the bottom?

PaigeMiller · Posted 06-02-2021 01:57 PM

I think I already said the "layman's interpretation" ... upward trend, continues during COVID.

The whole idea of statistical testing that you mentioned earlier assumes things have a constant mean and constant variance during the first time period (non-COVID), and the mean is not constant here, it has a clear upwards trend. So any hypothesis testing must take this into account. Any other representations of the data seems suspect to me.

The bottom chart indicates the absolute value of the change from month t-1 to month t. So by looking at this chart, we see no change in the change from month to month over this time period.

--
Paige Miller

A_Swoosh · Posted 06-02-2021 02:13 PM

I thought that hypothesis testing for this type of data would not be appropriate? Wouldn't I also be able to do a simple OLS regression with month dummies too?

PaigeMiller · Posted 06-03-2021 07:15 AM

@A_Swoosh wrote:
I thought that hypothesis testing for this type of data would not be appropriate? Wouldn't I also be able to do a simple OLS regression with month dummies too?

I don't understand any of this. Please explain further.

--
Paige Miller

Statistical Procedures

Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

Re: Statistical Test for Comparing Frequencies in Two Different Time Period?

comparing statistical difference between 2 groups

Overlapping time periods

Moving from SAS 9 to Viya: Statistical Procedures

Statistical test for the difference between coefficients

comparing dates in different formats

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...