BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
somebody
Lapis Lazuli | Level 10

I use PROC EXPAND to get the leads of variables. Using the ID statement gives the same results compared to not using it. So what does that statement really do? when should I use it ? and when should I not?

My sample is:

data WORK.sample;
  infile datalines dsd truncover;
  input date:YYMMDDN8. Name:$100. Holding:BEST12.;
  format date YYMMDDN8. Holding BEST12.;
datalines4;
20130916,ORACLE CORP,34776812
20130917,ORACLE CORP,35633772
20130918,ORACLE CORP,36068432
20130919,ORACLE CORP,37077832
20130920,ORACLE CORP,36241472
20130923,ORACLE CORP,35624692
20130924,ORACLE CORP,35356424
20130925,ORACLE CORP,35038106
20130926,ORACLE CORP,34763832
20130927,ORACLE CORP,34763832
20130930,ORACLE CORP,34473542
20131001,ORACLE CORP,34473542
20131002,ORACLE CORP,34849730
20131003,ORACLE CORP,35494052
20131004,ORACLE CORP,35233922
20131007,ORACLE CORP,34871741
20131008,ORACLE CORP,34601606
20131009,ORACLE CORP,33997304
20131010,ORACLE CORP,33845228
20130916,MICROSOFT CORP,71171845
20130917,MICROSOFT CORP,72924869
20130918,MICROSOFT CORP,73814023
20130919,MICROSOFT CORP,75878883
20130920,MICROSOFT CORP,74167999
20130923,MICROSOFT CORP,75703447
20130924,MICROSOFT CORP,75133009
20130925,MICROSOFT CORP,74456146
20130926,MICROSOFT CORP,73872937
20130927,MICROSOFT CORP,73872937
20130930,MICROSOFT CORP,73255808
20131001,MICROSOFT CORP,73255808
20131002,MICROSOFT CORP,74055748
20131003,MICROSOFT CORP,75425858
20131004,MICROSOFT CORP,74872708
20131007,MICROSOFT CORP,74102553
20131008,MICROSOFT CORP,73528128
20131009,MICROSOFT CORP,72243118
20131010,MICROSOFT CORP,71919738
20130916,T ROWE PRICE GROUP,2449245
20130917,T ROWE PRICE GROUP,2509565
20130918,T ROWE PRICE GROUP,2540160
20130919,T ROWE PRICE GROUP,2611210
20130920,T ROWE PRICE GROUP,2552340
20130923,T ROWE PRICE GROUP,2578730
20130924,T ROWE PRICE GROUP,2559166
20130925,T ROWE PRICE GROUP,2535952
20130926,T ROWE PRICE GROUP,2515950
20130927,T ROWE PRICE GROUP,2515950
20130930,T ROWE PRICE GROUP,2494780
20131001,T ROWE PRICE GROUP,2494780
20131002,T ROWE PRICE GROUP,2522228
20131003,T ROWE PRICE GROUP,2569240
20131004,T ROWE PRICE GROUP,2550260
20131007,T ROWE PRICE GROUP,2523834
20131008,T ROWE PRICE GROUP,2504124
20131009,T ROWE PRICE GROUP,2460032
20131010,T ROWE PRICE GROUP,2448936
;;;;

And here is the code:

proc sort data=sample; by name date;run;
proc expand data=sample out=sample; by name;id date; convert holding = lead_holding_with_id/ transformout=(lead 1); run;
proc expand data=sample out=sample; by name; convert holding = lead_holding_without_id/ transformout=(lead 1); run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
dw_sas
SAS Employee

Hi @somebody ,

 

@Kurt_Bremser is exactly correct in his comments regarding the use of the ID statement in PROC EXPAND.  

 

You indicated that you have gaps in the data due to weekends and holidays.  You can ignore the gaps due to weekends by using the 'weekday' interval.  By default, the weekday interval is defined as a 5-day interval from Monday-Friday.  If you have gaps in the weekday data due to holidays, then you might want to fill in those gaps with missing values prior to computing the lead variable.  You can perform your calculations using either a combination of PROC TIMESERIES (to fill in the gaps) and PROC EXPAND (to create the lead variable), or you can do this all in one step using PROC TIMEDATA.

 

Below is a slightly modified version of the earlier data you had provided, where some observations are omitted in each BY group to create gaps in the weekday data.  The two approaches mentioned above are illustrated in the code, along with some comments.  

 

  /* created some gaps in weekday data for each BY group */
data WORK.sample;
  infile datalines dsd truncover;
  input date:YYMMDD8. Name:$100. Holding:BEST12.;
  format date YYMMDDN8. Holding BEST12.;
datalines4;
20130916,ORACLE CORP,34776812
20130917,ORACLE CORP,35633772
20130918,ORACLE CORP,36068432
20130919,ORACLE CORP,37077832
20130920,ORACLE CORP,36241472
20130923,ORACLE CORP,35624692
20130924,ORACLE CORP,35356424
20130925,ORACLE CORP,35038106
20130926,ORACLE CORP,34763832
20130927,ORACLE CORP,34763832
20130930,ORACLE CORP,34473542
20131001,ORACLE CORP,34473542
20131002,ORACLE CORP,34849730
20131003,ORACLE CORP,35494052
20131004,ORACLE CORP,35233922
20131008,ORACLE CORP,34601606
20131009,ORACLE CORP,33997304
20131010,ORACLE CORP,33845228
20130916,MICROSOFT CORP,71171845
20130917,MICROSOFT CORP,72924869
20130918,MICROSOFT CORP,73814023
20130919,MICROSOFT CORP,75878883
20130920,MICROSOFT CORP,74167999
20130923,MICROSOFT CORP,75703447
20130924,MICROSOFT CORP,75133009
20130925,MICROSOFT CORP,74456146
20130930,MICROSOFT CORP,73255808
20131001,MICROSOFT CORP,73255808
20131002,MICROSOFT CORP,74055748
20131003,MICROSOFT CORP,75425858
20131004,MICROSOFT CORP,74872708
20131007,MICROSOFT CORP,74102553
20131008,MICROSOFT CORP,73528128
20131009,MICROSOFT CORP,72243118
20131010,MICROSOFT CORP,71919738
20130916,T ROWE PRICE GROUP,2449245
20130917,T ROWE PRICE GROUP,2509565
20130918,T ROWE PRICE GROUP,2540160
20130920,T ROWE PRICE GROUP,2552340
20130923,T ROWE PRICE GROUP,2578730
20130924,T ROWE PRICE GROUP,2559166
20130925,T ROWE PRICE GROUP,2535952
20130926,T ROWE PRICE GROUP,2515950
20130927,T ROWE PRICE GROUP,2515950
20130930,T ROWE PRICE GROUP,2494780
20131001,T ROWE PRICE GROUP,2494780
20131002,T ROWE PRICE GROUP,2522228
20131003,T ROWE PRICE GROUP,2569240
20131004,T ROWE PRICE GROUP,2550260
20131007,T ROWE PRICE GROUP,2523834
20131008,T ROWE PRICE GROUP,2504124
20131009,T ROWE PRICE GROUP,2460032
20131010,T ROWE PRICE GROUP,2448936
;;;;

proc sort data=sample; by name date;run;

  /* run PROC TIMESERIES to fill in gaps in the data with missing values */
  /* based on WEEKDAY interval                                           */
proc timeseries data=sample out=fullsample;
  by name;
  id date interval=weekday;
  var holding / setmissing=missing;
run;

  /* see gaps filled in with missing values */
proc print data=fullsample;
run;

  /* run PROC EXPAND with METHOD=NONE to prevent missing values from being    */
  /* interpolated prior to computing the lead variable.  Including the ID     */
  /* statement allows values of ID variable to be written to OUT= data set    */ 
proc expand data=fullsample out=sample3 method=none;
	by name;
    id date;
	convert holding = lead_holding_with_id/ transformout=(lead 1);			
run;

proc print data=sample3;
  var name date holding lead:;
  title 'Results from PROCs TIMESERIES and PROC EXPAND';
run;

  /* Alternatively, use PROC TIMEDATA to fill in gaps and create LEAD variable */
  /* in a single PROC step.                                                    */
proc timedata data=sample out=_null_ outarrays=sample4(drop=_s: _c:);
  by name;
  id date interval=weekday;
  var holding / setmissing=missing;
  outarrays lead_holding_timedata;

  do t=1 to _length_ - 1;
    lead_holding_timedata[t]=holding[t+1];
  end;

run;

proc print data=sample4;
  var name date holding lead:;
  title 'Results from PROC TIMEDATA';
run;
title;

I hope this helps!

DW

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

The documentation of the ID Statement provides a clue: if the statement is omitted, the procedure wiil create its own date or datetime values to create a consecutive series. If your source data has no gaps in the series, the results will be identical.

somebody
Lapis Lazuli | Level 10

I tested this. My data has gaps where weekends and holidays are not included. So according to the document, SAS would create these gaps without the IF statement? but it turns out there is no difference between using with and without the IF statement.

I edited the post and added a sample.

 

 

Kurt_Bremser
Super User

I see that you used neither the FROM nor the TO option in the PROC EXPAND statement. This means that the procedure has no ideas about the interval between the observations, and will treat them as a consecutive series.

dw_sas
SAS Employee

Hi @somebody ,

 

@Kurt_Bremser is exactly correct in his comments regarding the use of the ID statement in PROC EXPAND.  

 

You indicated that you have gaps in the data due to weekends and holidays.  You can ignore the gaps due to weekends by using the 'weekday' interval.  By default, the weekday interval is defined as a 5-day interval from Monday-Friday.  If you have gaps in the weekday data due to holidays, then you might want to fill in those gaps with missing values prior to computing the lead variable.  You can perform your calculations using either a combination of PROC TIMESERIES (to fill in the gaps) and PROC EXPAND (to create the lead variable), or you can do this all in one step using PROC TIMEDATA.

 

Below is a slightly modified version of the earlier data you had provided, where some observations are omitted in each BY group to create gaps in the weekday data.  The two approaches mentioned above are illustrated in the code, along with some comments.  

 

  /* created some gaps in weekday data for each BY group */
data WORK.sample;
  infile datalines dsd truncover;
  input date:YYMMDD8. Name:$100. Holding:BEST12.;
  format date YYMMDDN8. Holding BEST12.;
datalines4;
20130916,ORACLE CORP,34776812
20130917,ORACLE CORP,35633772
20130918,ORACLE CORP,36068432
20130919,ORACLE CORP,37077832
20130920,ORACLE CORP,36241472
20130923,ORACLE CORP,35624692
20130924,ORACLE CORP,35356424
20130925,ORACLE CORP,35038106
20130926,ORACLE CORP,34763832
20130927,ORACLE CORP,34763832
20130930,ORACLE CORP,34473542
20131001,ORACLE CORP,34473542
20131002,ORACLE CORP,34849730
20131003,ORACLE CORP,35494052
20131004,ORACLE CORP,35233922
20131008,ORACLE CORP,34601606
20131009,ORACLE CORP,33997304
20131010,ORACLE CORP,33845228
20130916,MICROSOFT CORP,71171845
20130917,MICROSOFT CORP,72924869
20130918,MICROSOFT CORP,73814023
20130919,MICROSOFT CORP,75878883
20130920,MICROSOFT CORP,74167999
20130923,MICROSOFT CORP,75703447
20130924,MICROSOFT CORP,75133009
20130925,MICROSOFT CORP,74456146
20130930,MICROSOFT CORP,73255808
20131001,MICROSOFT CORP,73255808
20131002,MICROSOFT CORP,74055748
20131003,MICROSOFT CORP,75425858
20131004,MICROSOFT CORP,74872708
20131007,MICROSOFT CORP,74102553
20131008,MICROSOFT CORP,73528128
20131009,MICROSOFT CORP,72243118
20131010,MICROSOFT CORP,71919738
20130916,T ROWE PRICE GROUP,2449245
20130917,T ROWE PRICE GROUP,2509565
20130918,T ROWE PRICE GROUP,2540160
20130920,T ROWE PRICE GROUP,2552340
20130923,T ROWE PRICE GROUP,2578730
20130924,T ROWE PRICE GROUP,2559166
20130925,T ROWE PRICE GROUP,2535952
20130926,T ROWE PRICE GROUP,2515950
20130927,T ROWE PRICE GROUP,2515950
20130930,T ROWE PRICE GROUP,2494780
20131001,T ROWE PRICE GROUP,2494780
20131002,T ROWE PRICE GROUP,2522228
20131003,T ROWE PRICE GROUP,2569240
20131004,T ROWE PRICE GROUP,2550260
20131007,T ROWE PRICE GROUP,2523834
20131008,T ROWE PRICE GROUP,2504124
20131009,T ROWE PRICE GROUP,2460032
20131010,T ROWE PRICE GROUP,2448936
;;;;

proc sort data=sample; by name date;run;

  /* run PROC TIMESERIES to fill in gaps in the data with missing values */
  /* based on WEEKDAY interval                                           */
proc timeseries data=sample out=fullsample;
  by name;
  id date interval=weekday;
  var holding / setmissing=missing;
run;

  /* see gaps filled in with missing values */
proc print data=fullsample;
run;

  /* run PROC EXPAND with METHOD=NONE to prevent missing values from being    */
  /* interpolated prior to computing the lead variable.  Including the ID     */
  /* statement allows values of ID variable to be written to OUT= data set    */ 
proc expand data=fullsample out=sample3 method=none;
	by name;
    id date;
	convert holding = lead_holding_with_id/ transformout=(lead 1);			
run;

proc print data=sample3;
  var name date holding lead:;
  title 'Results from PROCs TIMESERIES and PROC EXPAND';
run;

  /* Alternatively, use PROC TIMEDATA to fill in gaps and create LEAD variable */
  /* in a single PROC step.                                                    */
proc timedata data=sample out=_null_ outarrays=sample4(drop=_s: _c:);
  by name;
  id date interval=weekday;
  var holding / setmissing=missing;
  outarrays lead_holding_timedata;

  do t=1 to _length_ - 1;
    lead_holding_timedata[t]=holding[t+1];
  end;

run;

proc print data=sample4;
  var name date holding lead:;
  title 'Results from PROC TIMEDATA';
run;
title;

I hope this helps!

DW

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1193 views
  • 3 likes
  • 3 in conversation