BookmarkSubscribeRSS Feed
Primavera
Quartz | Level 8

Hi all,

 

I am preparing a report about the interactions of a bunch of users with an online reporting tool. I have broken the users down into 3 groups (G1, G2, and G3). I am comparing them over about 20 metrics. The data collection period covers from 2019 to 2022 but not every participant has data in all the data collection intervals in this period. It seems like each subject has reporting periods that are unique to them and were not the same as other subjects! I have used Tableau and a date scaffold to create the following plots on three of the metrics that I have to report on. The date scaffold is very similar to the idea presented here: https://tarsolutions.co.uk/blog/tableau-scaffolding-dates-calculating-deferred-revenue/

Some plots are clearly different between the 3 groups (like the green plots). Some are not really that different between the 3 groups (like the yellow line plots). I have attached a sample of my data containing a few users to this post. The original data file included Reporting_Period_Start_Date and Reporting_Period_End_Date and the "Value" was being compared which was "Numerator"/"Denominator". I have created the Date_scaf and corresponding Daily_Numerator and Daily_Denominator and AVGs based on the original data. 

So my question is; I can see that the green line plots are much different between the 3 groups but is there a statistical test for this difference?

 I tried using ANOVA:

 

proc ANOVA data=SAS_Stat_Q;                                                                                                                                                                                                  
      class Group;                                                                                                              
      model AVGs = Group;                                                                                                       
      means Groups /hovtest welch;                                                                                                 
      run;                                                                                                                              
              

But this is incorrect. What the test interprets as means is that AVGs for each group are added together and then divided by the number of records in that group (a classic mean). What I need is a test that considers each date in the date_scaf for example 11/26/2020 which makes for the following table:

 

Date_scaf Reporting_Period_End_Date Reporting_Period_Start_Date AVGs Daily_Denominator Daily_Numerator Denominator Numerator Value personID Group
11/26/2020 11/28/2020 11/1/2020 10.54166 0.285714 3.011903 8 84.3333 10.54166 person009 G3
11/26/2020 11/28/2020 11/1/2020 12.73263 0.857143 10.91369 24 305.5832 12.73263 person004 G3
11/26/2020 11/28/2020 11/1/2020 20.76333 0.714286 14.83095 20 415.2666 20.76333 person007 G3
11/26/2020 11/28/2020 11/1/2020 1.356139 0.678571 0.920237 19 25.76664 1.356139 person011 G2
11/26/2020 11/28/2020 11/1/2020 2.133332 0.071429 0.152381 2 4.266664 2.133332 person021 G2
11/26/2020 11/28/2020 11/1/2020 11.18214 0.5 5.59107 14 156.5499 11.18214 person010 G1
11/26/2020 11/28/2020 11/1/2020 7.892497 0.714286 5.637498 20 157.8499 7.892497 person055 G1
11/26/2020 11/28/2020 11/1/2020 6.466663 0.321429 2.07857 9 58.19996 6.466663 person047 G1

 so now for G1 the average that I want for this date is sum(Daily_Numerator for persons 010, 055, 047)/sum(Daily_Denominator for persons 010, 055, 047). This would be (13.30713/1.5357)=8.6652

 

Similar averages should be calculated for groups G2 and G3 and the results of all three to be compared. 

 

Does anyone know what statistical test should be used here?

1 REPLY 1
sbxkoenk
SAS Super FREQ

Hello,

 

Do you have SAS/ETS (part of SAS Econometrics in SAS VIYA)??

 

You could use PROC SIMILARITY to compare time series.
proc similarity has dynamic time warping (DTW) so time series can have :

  • different frequencies,
  • irregular intervals ,
  • different scale and
  • different length.

SAS/ETS 15.2 User's Guide
The SIMILARITY Procedure
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/etsug/etsug_similarity_syntax02.htm

 

See also here :
Fundamentals of Statistical Consulting
Week 10 Comparison of two time series

https://www.maths.usyd.edu.au/u/jchan/Consult/W10_CompareTwoTimeSeries.pdf


BR,

Koen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 703 views
  • 1 like
  • 2 in conversation