DATA Step, Macro, Functions and more

Data step first. question

Reply
Contributor
Posts: 73

Data step first. question

[ Edited ]

 

 
data test; set test;
	   by code;
	   leaddate=lag(date);
	   if first.code then leaddate=intnx('month',date,3,'sameday');
	   *if sunday move back by 2 days, if saturday move back by 1 day;
	   if weekday(date)=1 then date1=intnx('day',date,-2); else
	   if weekday(date)=7 then date1=intnx('day',date,-1); else date1=date;
	   if weekday(leaddate)=1 then leaddate1=intnx('day',leaddate,2); else
	   if weekday(leaddate)=7 then leaddate1=intnx('day',leaddate,3); else
	   if weekday(leaddate)=6 then leaddate1=intnx('day',leaddate,3);else
	   leaddate1=intnx('day',leaddate,1);
	   if leaddate=date then delete;
	   format date1 leaddate1 date9.;
	run;

 

I have a question regarding the code above. I am using a sample code but I don't know why I have to use the first function. Does the leaddate in the 4th line of the code replace the leaddate in the 3rd line. One is a leading variable and the other calculates 3 months ahead. Why do I need to keep this if first.statement in the code. Is this code correct?

 

Super User
Posts: 6,526

Re: Data step first. question

Posted in reply to Agent1592

Whether the code is correct or not ... depends on what it is supposed to do.  But you do require that first function.

 

LEADDATE needs to be set on every observation.  The later statement that works with FIRST.CODE sets LEADDATE for those observations only, not for all the other observations that are not the first one for a CODE.

SAS Super FREQ
Posts: 9,253

Re: Data step first. question

Posted in reply to Agent1592

Hi:

  Without seeing data or understanding the program logic, it is hard to comment on whether you need the first INTNX function call. You need to do SOMETHING at the change of the CODE group. Because I assume that at the change, you do not want the leaddate value from the previous group to carry forward? It looks like you want to advance the date on the change of the CODE group. Have you examined the output? Are the dates as you expect them to be.

 

  For proof of concept about how LAG works and why you would want/need to do SOMETHING  at the first of the group,  For example, you did not post any data. so consider this fake data. Each group based on the CODE variable has 4 observations, as shown below:

data test;
  infile datalines;
  input code date : date. name $;
datalines;
1 1nov17 albert
1 2nov17 bob
1 3nov17 carl
1 4nov17 dave
2 1dec17 alice
2 2dec17 barb
2 3dec17 cathy
2 4dec17 diane
;
run;

data newtest1;
  set test; by code;
  leaddate = lag(date);
  format date leaddate date9.;
run;

proc print data=newtest1;
  title 'newtest1 no test for first.code';
run;


data newtest2;
  set test; by code;
  leaddate = lag(date);
  if first.code then leaddate=.;
  format date leaddate date9.;
run;

proc print data=newtest2;
  title 'newtest2 has test for first.code and sets leaddate to missing';
run;

  One assumes that you do not want the date from dave (observation #4) to be lagged to alice (observation #5 and the first.code value for code=2).

 

  So the output from NEWTEST1 shows what happens when you do not perform any test for first.code. In the output for NEWTEST2, it shows that you can reset the value for LEADDATE to something (in my program to missing) for the first observation's value in every BY group.

 

  Whether INTNX is doing what you want or need is a question that only you can answer. Here's the output from the above programs.

 

cynthia

lag_leaddate.png

 

Super User
Posts: 22,820

Re: Data step first. question

Posted in reply to Agent1592

It means you're grouping your analysis by the CODE grouping variable and that for the first record of every unique CODE value it does something specific, in this case calculate the LEADDATE that is then used for further calculations. If it's not the first value of CODE then it uses the lagged value of the variable DATE. 

 

 *get date previous date value;
    leaddate=lag(date);

    *if first record for CODE then calculate a new leaddate;
    if first.code then
        leaddate=intnx('month', date, 3, 'sameday');

 

Correct code and correct logic are not the same thing, in this case your code may be 100% correct, but the logic can be entirely incorrect. 

 

Something that would help with understanding the code would be comments and proper formatting. 

 

data test;
    set test; *using the same name over and over is a bad idea;
    by code; *set to use first date;
    
    *get date previous date value;
    leaddate=lag(date);

    *if first record for CODE then calculate a new leaddate;
    if first.code then
        leaddate=intnx('month', date, 3, 'sameday');
    
    *if sunday move back by 2 days, if saturday move back by 1 day;
    if weekday(date)=1 then
        date1=intnx('day', date, -2);
    else if weekday(date)=7 then
        date1=intnx('day', date, -1);
    else
        date1=date;

    *Not sure what you're doing here;
    if weekday(leaddate)=1 then
        leaddate1=intnx('day', leaddate, 2);
    else if weekday(leaddate)=7 then
        leaddate1=intnx('day', leaddate, 3);
    else if weekday(leaddate)=6 then
        leaddate1=intnx('day', leaddate, 3);
    else
        leaddate1=intnx('day', leaddate, 1);
  
    *remove records if certain condition met...?;
    if leaddate=date then
        delete;
        
    format date1 leaddate1 date9.;
run;

 

Trusted Advisor
Posts: 1,283

Re: Data step first. question

Posted in reply to Agent1592

The lag function will cross by-group boundaries, which contaminates the first observation of the current by-group with data from the last obs of the preceding group.  The "if first." statements tests whether the record-in-hand is at the beginning of a by-group, and therefore needs to be de-contaminated.

Esteemed Advisor
Posts: 5,391

Re: Data step first. question

Posted in reply to Agent1592

At first sight, it might look like the statements

 

    leaddate=lag(date);
    if first.code then leaddate=intnx('month',date,3,'sameday');

could be replaced with the better looking

 

    if first.code then leaddate=intnx('month',date,3,'sameday');
    else leaddate=lag(date);

 

but that wouldn't work. To work properly, the lag() function must be called for every observation. Calling the lag() function conditionnally is (almost) never a good idea.

PG
Contributor
Posts: 73

Re: Data step first. question

Thanks the dates are company reporting days and if the earnings reporting day is Sunday the idea is to move by 2 days. Is there a better code using PROC SQL?
Super User
Posts: 22,820

Re: Data step first. question

Posted in reply to Agent1592

SQL is harder because it doesn't have the row concept and the LAG function is harder to achieve in SQL 

 

Esteemed Advisor
Posts: 5,391

Re: Data step first. question

Posted in reply to Agent1592

date1 is the most recent weekday relative to date, simple enough. But what is leaddate1; you move Saturday and Sunday to the next Tuesday and Friday to the next Monday. What's the logic?

PG
Contributor
Posts: 73

Re: Data step first. question

These are trading days. So the idea is to move weekend days to trading week days.

Ask a Question
Discussion stats
  • 9 replies
  • 256 views
  • 3 likes
  • 6 in conversation