BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
karora1
Fluorite | Level 6

I've been working with proc expand do some simple backward 7 day moving averages and sums. However, I was doing some verifications on my data and i seem to be off in the sum of my original variable in output file and correspondingly the sum and averages. I have exported the file right before the proc expand and seem to be correct but after the expand procedure the count is off by a few. I am clearly missing something but can't figure it out. The additional counts all seem to be right when there should be a blank (no data) and right before a count of 1 in the next day of data. However there are other cases where this happens and the increase does not happen. If i run the same code by a different geographic By variable, the counts line up. I tried to address the missing days by using the transformin = (setmiss 0) so I thought I was okay.  Any suggestions are greatly appreciated.

proc summary data=Cmontest nway missing;
class overallSAG_ID overall_SAG_name COLLECTION_DATE;
 var poscount tottestcount;
 output out=Cmontests3(drop=_type_ _freq_) sum=;
run;

proc sort data = Cmontests3;
by overallSAG_ID COLLECTION_DATE;
run;

PROC EXPAND DATA=COVIDmontests3 OUT = COVIDmontests10
FROM=DAY METHOD=NONE;
BY overallSAG_ID;
ID COLLECTION_DATE;
CONVERT poscount = MOVESUM7 / TRANSFORMIN=(SETMISS 0) TRANSFORMOUT = (MOVSUM 7); CONVERT poscount = MOVEAVE7 / TRANSFORMIN=(SETMISS 0) TRANSFORMOUT=(MOVAVE 7); RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
dw_sas
SAS Employee

Hi @karora1 , 

 

In general, the METHOD=NONE option should not be used in PROC EXPAND when the TO= option is also specified.  One alternative approach is to first use PROC TIMESERIES to fill in the gaps in your data with 0 values, and then run PROC EXPAND to create your moving sum.  Another alternative is to use PROC TIMEDATA to both fill in the gaps in your data and compute the moving sum.  Both of these alternatives are shown below using the example data provided by @PeterClemmensen .

 

data have;
input id dt :date9. var;
format dt date9.;
datalines;
1 01jan2020 1
1 02jan2020 2
1 04jan2020 3
2 01jan2020 4
2 02jan2020 5
2 04jan2020 6
;

 /* fill gaps in the data with 0 values */
 /* based on INTERVAL=DAY specification */
 /* and SETMISSING=0 option.            */
proc timeseries data=have out=temp;
  by id;
  id dt interval=day;
  var var / setmissing=0;
run;

 /* compute moving sum on daily series */
proc expand data=temp out=want method=none;
   by id;
   id dt;
   convert var = movesum / transformout = (movsum 2);
run;

proc print data=want;
run;

 /* fill in gaps with 0 values and compute */
 /* moving sum in one step.                */
proc timedata data=have out=_null_ outarrays=want2(drop=_:);
  by id;
  id dt interval=day;
  var var / setmissing=0;
  outarrays movsum;
  movsum[1]=var[1];
  do t=2 to _length_;
    movsum[t]=var[t] + var[t-1];
  end;
run;

proc print data=want2;
run;

I hope this helps!

DW

 

 

 

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Can you post your log please?

karora1
Fluorite | Level 6
1320  PROC EXPAND DATA=COVIDmontests1 OUT = COVIDmontests2
1321  FROM=DAY TO =DAY METHOD=NONE;
1322  BY overallSAG_ID;
1323  ID COLLECTION_DATE;
1324  CONVERT poscount = MOVESUM7 / TRANSFORMIN =(SETMISS 0) TRANSFORMOUT = (MOVSUM 7);
1325  CONVERT poscount = MOVEAVE7 / TRANSFORMIN =(SETMISS 0) TRANSFORMOUT =(MOVAVE 7);
1326  RUN;

WARNING: 1 observations have been omitted before observation number 6 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/13/2020, the previous is COLLECTION_DATE=10/11/2020.
WARNING: 1 observations have been omitted before observation number 16 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/24/2020, the previous is COLLECTION_DATE=10/22/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1011
WARNING: 1 observations have been omitted before observation number 73 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/25/2020, the previous is COLLECTION_DATE=10/23/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1022
WARNING: 1 observations have been omitted before observation number 79 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/12/2020, the previous is COLLECTION_DATE=10/10/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1031
WARNING: 1 observations have been omitted before observation number 103 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/18/2020, the previous is COLLECTION_DATE=10/16/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1032
WARNING: 1 observations have been omitted before observation number 141 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/19/2020, the previous is COLLECTION_DATE=10/17/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1042
WARNING: 3 observations have been omitted before observation number 150 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/13/2020, the previous is COLLECTION_DATE=10/09/2020.
WARNING: 1 observations have been omitted before observation number 155 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/19/2020, the previous is COLLECTION_DATE=10/17/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1043
WARNING: 1 observations have been omitted before observation number 183 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/13/2020, the previous is COLLECTION_DATE=10/11/2020.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1052
WARNING: 2 observations have been omitted before observation number 217 in data set
         WORK.COVIDMONTESTS1 according to the FROM=DAY specification and the ID variable values.
         The current ID is COLLECTION_DATE=10/12/2020, the previous is COLLECTION_DATE=10/09/2020.
NOTE: Further warnings for gaps in data will not be printed.
NOTE: The above message was for the following BY group:
      overallSAG_ID=1054
NOTE: The data set WORK.COVIDMONTESTS2 has 2062 observations and 6 variables.
NOTE: PROCEDURE EXPAND used (Total process time):
      real time           0.06 seconds
      cpu time            0.01 seconds

Sure here is the log. I'm still getting errors related to the blanks but wasn't sure if that was because the errors were occurring at the By and ID statements before the setmiss option is run. 

PeterClemmensen
Tourmaline | Level 20

Ok. This happens because of the From=Day Option in the PROC EXPAND Statement. The From=Day Option tells SAS that there is a 1 day time interval between observations in the input data set. Which there is not.

 

You can replicate the Warning by un-commenting the commented bloc below.

 

data have;
input id dt :date9. var;
datalines;
1 01jan2020 1
1 02jan2020 2
1 04jan2020 3
2 01jan2020 4
2 02jan2020 5
2 04jan2020 6
;

proc expand data = have out=want /*from=day*/;
   by id;
   id dt;
   convert var = movesum / transformout = (movsum 2);
run;
karora1
Fluorite | Level 6

If i remove the "from = day" then the transformin = (setmiss 0) does not add the blank days and zero counts to those days. This is problematic for the moving average part of the transformout part of the code. When I leave that section in the program does add zero's for blank days; however in 3 cases it adds a 1 instead. Do I have to rearrange the code so that the transformin code still works?

dw_sas
SAS Employee

Hi @karora1 , 

 

In general, the METHOD=NONE option should not be used in PROC EXPAND when the TO= option is also specified.  One alternative approach is to first use PROC TIMESERIES to fill in the gaps in your data with 0 values, and then run PROC EXPAND to create your moving sum.  Another alternative is to use PROC TIMEDATA to both fill in the gaps in your data and compute the moving sum.  Both of these alternatives are shown below using the example data provided by @PeterClemmensen .

 

data have;
input id dt :date9. var;
format dt date9.;
datalines;
1 01jan2020 1
1 02jan2020 2
1 04jan2020 3
2 01jan2020 4
2 02jan2020 5
2 04jan2020 6
;

 /* fill gaps in the data with 0 values */
 /* based on INTERVAL=DAY specification */
 /* and SETMISSING=0 option.            */
proc timeseries data=have out=temp;
  by id;
  id dt interval=day;
  var var / setmissing=0;
run;

 /* compute moving sum on daily series */
proc expand data=temp out=want method=none;
   by id;
   id dt;
   convert var = movesum / transformout = (movsum 2);
run;

proc print data=want;
run;

 /* fill in gaps with 0 values and compute */
 /* moving sum in one step.                */
proc timedata data=have out=_null_ outarrays=want2(drop=_:);
  by id;
  id dt interval=day;
  var var / setmissing=0;
  outarrays movsum;
  movsum[1]=var[1];
  do t=2 to _length_;
    movsum[t]=var[t] + var[t-1];
  end;
run;

proc print data=want2;
run;

I hope this helps!

DW

 

 

 

karora1
Fluorite | Level 6

I tried the timeseries option and it worked great. Thanks a lot for the suggestions, much appreciated. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1259 views
  • 2 likes
  • 3 in conversation