Following is the sample of my dataset. I need to calculate lag of "asset" for each company_id. Then years 2003 and 2013 will have to be dropped. How can I do that?
year company_id asset
2003 | AN8068571086 | 3109000 |
2004 | AN8068571086 | 2997000 |
2005 | AN8068571086 | 3496000 |
2006 | AN8068571086 | 2998873 |
2007 | AN8068571086 | 3169033 |
2008 | AN8068571086 | 3692000 |
2009 | AN8068571086 | 4642000 |
2010 | AN8068571086 | 5006000 |
2011 | AN8068571086 | 4837000 |
2012 | AN8068571086 | 6312000 |
2013 | AN8068571086 | 8478000 |
2003 | ANN6748L1027 | 26679 |
2004 | ANN6748L1027 | 29661 |
2005 | ANN6748L1027 | 65606 |
2006 | ANN6748L1027 | 25151 |
2007 | ANN6748L1027 | 28439 |
2008 | ANN6748L1027 | 18375 |
2009 | ANN6748L1027 | 17421 |
2010 | ANN6748L1027 | 27269 |
2011 | ANN6748L1027 | 62001 |
2012 | ANN6748L1027 | 39696 |
2013 | ANN6748L1027 | 39492 |
2003 | BMG0129K1045 | . |
2004 | BMG0129K1045 | 0 |
2005 | BMG0129K1045 | 741997 |
2006 | BMG0129K1045 | 58002 |
2007 | BMG0129K1045 | 13546 |
2008 | BMG0129K1045 | 80947 |
2009 | BMG0129K1045 | 142666 |
2010 | BMG0129K1045 | 239957 |
2011 | BMG0129K1045 | 295522 |
2012 | BMG0129K1045 | 618217 |
2013 | BMG0129K1045 | 777386 |
what do you mean by lag.
Say, these are asset values of the companies at time t. I want to have asset values at t-1. This means, in year 2004, I want the asset value of year 2003.
proc sort data=have;
by company_id year;
run;
data want;
set have;
by company_id;
retain x .;
if first.company_id then x=.;
Else if not last.company_id then output;
x=asset;
run;
Thanks. Your code works.
Assuming that the company_id is unique then sort the data set by it.
proc sort data = have;
by company_id;
run;
There is a SAS LAG() function available. The following code just saves the previous ASSET and writes it on the next observation.
The empty format statement retains the order of the variables.
data want;
format year company_id asset _lag;
retain prev_asset _lag;
drop prev_asset;
do until(last.company_id);
set have;
by company_id;
if first.company_id then do; prev_asset = asset; _lag = .; end;
else prev_asset = asset;
output;
_lag = prev_asset;
end;
run;
Thanks. Your code works as well. But I also need to drop the observations of year 2003 and 2013.
I'm not sure what you mean by 2003 and 2013 will have to be dropped. I learnt this from support.sas.com.
Thanks. I also need to drop the observations of year 2003 and 2013. So will I just write the two following lines after "drop i c"?
if year = 2003 then delete;
if year = 2013 then delete;
I also need to calculate the change in asset i.e. delta_asset = asset - lasset1.
If I want to perform these operations (i.e. calculating lag then calculating the change) for more than one variable (say, asset is one variable, debt is another variable etc.) then what will be the code?
This "asset - lasset1" is calculated by DIF function similar to lag but difference. I don't know what the rest of your question is saying. Show some code data etc.
Ok, the following is my sample dataset. ca is current asset, cl is current liability and stdebt is short term debt. So I need to calculate the lca1, lca2, lca3 (which we already did before). Now need to calculate the difference between ca and lca1 .... (which is the change in ca between year 2004 and year 2003). In this way, need to same for cl and stdebt as well. In final output, observations for year 2003 and 2013 will have to be dropped.
year companyISIN ca cl stdebt
2003 | AN8068571086 | 10369000 | 6795000 | 1411000 |
2004 | AN8068571086 | 7060000 | 4701000 | 716000 |
2005 | AN8068571086 | 8554000 | 5515000 | 796000 |
2006 | AN8068571086 | 9186000 | 6455000 | 1321000 |
2007 | AN8068571086 | 11055000 | 7505000 | 1672000 |
2008 | AN8068571086 | 12894000 | 8124000 | 1597000 |
2009 | AN8068571086 | 13650000 | 7259000 | 1125000 |
2010 | AN8068571086 | 18098000 | 10865000 | 2595000 |
2011 | AN8068571086 | 20539000 | 10538000 | 1377000 |
2012 | AN8068571086 | 24156000 | 12368000 | 2121000 |
2013 | AN8068571086 | 26225000 | 13525000 | 2783000 |
2003 | ANN6748L1027 | 117555 | 42087 | 8852 |
2004 | ANN6748L1027 | 119830 | 33449 | 7468 |
2005 | ANN6748L1027 | 175355 | 66052 | 12915 |
2006 | ANN6748L1027 | 177348 | 46268 | 2586 |
2007 | ANN6748L1027 | 192332 | 50215 | 8252 |
2008 | ANN6748L1027 | 197199 | 53846 | 3759 |
2009 | ANN6748L1027 | 209047 | 61461 | 3868 |
2010 | ANN6748L1027 | 232885 | 62394 | 8450 |
2011 | ANN6748L1027 | 312343 | 135418 | 14527 |
2012 | ANN6748L1027 | 258057 | 52148 | 12 |
2013 | ANN6748L1027 | 204018 | 48645 | . |
2003 | BMG0129K1045 | . | . | . |
2004 | BMG0129K1045 | 7935 | 5746 | 0 |
2005 | BMG0129K1045 | 745097 | 553813 | 496757 |
2006 | BMG0129K1045 | 65682 | 391042 | 185702 |
2007 | BMG0129K1045 | 18503 | 1534434 | 945035 |
2008 | BMG0129K1045 | 84108 | 799848 | 136738 |
2009 | BMG0129K1045 | 145607 | 868804 | 170089 |
2010 | BMG0129K1045 | 241772 | 1110606 | 302223 |
2011 | BMG0129K1045 | 299168 | 1080563 | 247228 |
2012 | BMG0129K1045 | 623842 | 1079469 | 281611 |
2013 | BMG0129K1045 | 832675 | 602781 | 365325 |
Do you need DIF1 only or do you need 3 or more like I showed you for LAG1 LAG2 and LAG3.
How much of the documentation on this subject have you read?
Sometimes I need DIF1 only, and sometimes I need all three. So it's better to do for all three.
I can do the DIF1 for one variable (according to your previous code). But now there are three variables (ca, cl and stdebt).
It gets a bit fiddly with multiple lags and variables. How much of the documentation on this subject have you read?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.