data sue1/view=sue1;
set comp2;
by gvkey fqtr fyearq;
if dif(fyearq)=1 then do;
lagadj=lag(ajexq); lageps_p=lag(epspxq);lageps_d=lag(epsfxq);
lagshr_p=lag(cshprq);lagshr_d=lag(cshfdq);lagspiq=lag(spiq);
end;
if first.gvkey then do;
lageps_d=.;lagadj=.; lageps_p=.;
lagshr_p=.;lagshr_d=.;lagspiq=.;end;
if basis='P' then do;
actual1=epspxq/ajexq; expected1=lageps_p/lagadj;
end;
else if basis='D' then do;
actual1=epsfxq/ajexq; expected1=lageps_d/lagadj;
end;
else do;
actual1=epspxq/ajexq; expected1=lageps_p/lagadj;
end;
sue1=(actual1-expected1)/(prccq/ajexq);
format sue1 percent7.4 rdq date9.;
label datadate='Calendar date of fiscal period end';
keep ticker permno gvkey conm fyearq fqtr fyr datadate
rdq sue1 basis
act prccq mcap;
run;
data sue1/view=sue1; ***
This creates a view instead of a SAS data set. For beginners, just leave out the /view=sue1, and create a data set with data sue1; and don't worry about views.
by gvkey fqtr fyearq; ***
This indicates to SAS that the data is sorted by these three variables, which will be useful later.
if dif(fyearq)=1 then do; ***
The DIF function finds the difference between the value of fyearq in consecutive observations. If that difference = 1, then some action is taken. Please familiarize yourself with the SAS documentation, so you can look up functions that you find in existing SAS code, and also search for functions that might be useful in programs you write. Here is a list of all functions in alphabetical order.
if first.gvkey then do; ***
Because you told SAS the data is sorted, this finds the first record in a block of records where gvtkey is unchanged, and then takes some action.
format sue1 percent7.4 rdq date9.; ***
Formats change the appearance of variables. Sue1 will appear as a percent, in other words the value is between 0 and 1 but it will appear as a value between 0 and 100 with a percent sign at the end and four decimal place. Rdq will appear as a date.
Be specific. State the exact part of the code you don't understand. No one will try to explain all 27 lines of code that you posted.
I have starred the codes that are not clear to me
data sue1/view=sue1;  ***
set comp2;
by gvkey fqtr fyearq; ***
if dif(fyearq)=1 then do; *** 
lagadj=lag(ajexq); lageps_p=lag(epspxq);lageps_d=lag(epsfxq);
lagshr_p=lag(cshprq);lagshr_d=lag(cshfdq);lagspiq=lag(spiq);
end;
if first.gvkey then do; ***
lageps_d=.;lagadj=.; lageps_p=.;
lagshr_p=.;lagshr_d=.;lagspiq=.;end;
if basis='P' then do;
actual1=epspxq/ajexq; expected1=lageps_p/lagadj;
end;
else if basis='D' then do;
actual1=epsfxq/ajexq; expected1=lageps_d/lagadj;
end;
else do;
actual1=epspxq/ajexq; expected1=lageps_p/lagadj;
end;
sue1=(actual1-expected1)/(prccq/ajexq);
format sue1 percent7.4 rdq date9.; ***
label datadate='Calendar date of fiscal period end';
keep ticker permno gvkey conm fyearq fqtr fyr datadate
rdq sue1 basis
act prccq mcap;
run;
If you don't understand a Data step start or a By statement then it is time to take the Programming 1 class. Do a google search for "sas programming 1". It will guide you to the free online course for the entry into SAS programming. And https://communities.sas.com/t5/SAS-Communities-Library/Free-SAS-Learning-Resources/ta-p/425554
And very likely, where ever you got the code from, it may not have performed as desired with the LAG function calls inside of an If /then/ do block of code.
data sue1/view=sue1; ***
This creates a view instead of a SAS data set. For beginners, just leave out the /view=sue1, and create a data set with data sue1; and don't worry about views.
by gvkey fqtr fyearq; ***
This indicates to SAS that the data is sorted by these three variables, which will be useful later.
if dif(fyearq)=1 then do; ***
The DIF function finds the difference between the value of fyearq in consecutive observations. If that difference = 1, then some action is taken. Please familiarize yourself with the SAS documentation, so you can look up functions that you find in existing SAS code, and also search for functions that might be useful in programs you write. Here is a list of all functions in alphabetical order.
if first.gvkey then do; ***
Because you told SAS the data is sorted, this finds the first record in a block of records where gvtkey is unchanged, and then takes some action.
format sue1 percent7.4 rdq date9.; ***
Formats change the appearance of variables. Sue1 will appear as a percent, in other words the value is between 0 and 1 but it will appear as a value between 0 and 100 with a percent sign at the end and four decimal place. Rdq will appear as a date.
One more item to add to your study list. This code will generate the wrong result:
if dif(fyearq)=1 then do;
lagadj=lag(ajexq); lageps_p=lag(epspxq);lageps_d=lag(epsfxq);
lagshr_p=lag(cshprq);lagshr_d=lag(cshfdq);lagspiq=lag(spiq);
end;
if first.gvkey then do;
lageps_d=.;lagadj=.; lageps_p=.;
lagshr_p=.;lagshr_d=.;lagspiq=.;end;The LAG function should (almost always) execute on every observation if you want accurate results. This would be an improvement:
lagadj=lag(ajexq); lageps_p=lag(epspxq);lageps_d=lag(epsfxq);
lagshr_p=lag(cshprq);lagshr_d=lag(cshfdq);lagspiq=lag(spiq);
if first.gvkey  or dif(fyear1) ne 1 then do;
lageps_d=.;lagadj=.; lageps_p=.;
lagshr_p=.;lagshr_d=.;lagspiq=.;end;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
