Dear Madam/Sir,
I would like to perform one operation by firm (gvkey) in dataset. Below is as-is dataset for only one firm and I like to drop observations after lead5 (from 2012) for every firm in the dataset.
Do loop example mostly use array. In this case, I cannot use array. Any help will be highly appreciated. Thank you.
data want; set have;
do over gvkey
if cyear gt lead5 then delete;
end;
run;
1161 | 2000 | 0 | 0 | 0 | 0 | 0 |
1161 | 2001 | 0 | 0 | 0 | 0 | 0 |
1161 | 2002 | 0 | 0 | 0 | 0 | 0 |
1161 | 2003 | 0 | 0 | 0 | 0 | 0 |
1161 | 2004 | 0 | 0 | 0 | 0 | 0 |
1161 | 2005 | 0 | 0 | 0 | 0 | 0 |
1161 | 2006 | 0 | 0 | 0 | 0 | 0 |
1161 | 2007 | 1 | 0 | 0 | 0 | 0 |
1161 | 2008 | 0 | 1 | 0 | 0 | 0 |
1161 | 2009 | 0 | 0 | 1 | 0 | 0 |
1161 | 2010 | 0 | 0 | 0 | 1 | 0 |
1161 | 2011 | 0 | 0 | 0 | 0 | 1 |
1161 | 2012 | 0 | 0 | 0 | 0 | 0 |
1161 | 2013 | 0 | 0 | 0 | 0 | 0 |
1161 | 2014 | 0 | 0 | 0 | 0 | 0 |
1161 | 2015 | 0 | 0 | 0 | 0 | 0 |
1161 | 2016 | 0 | 0 | 0 | 0 | 0 |
1161 | 2017 | 0 | 0 | 0 | 0 | 0 |
1161 | 2018 | 0 | 0 | 0 | 0 | 0 |
1161 | 2019 | 0 | 0 | 0 | 0 | 0 |
1161 | 2020 | 0 | 0 | 0 | 0 | 0 |
1161 | 2021 | 0 | 0 | 0 | 0 | 0 |
data want;
set have;
retain flag_year flag;
by gvkey;
if first.gvkey then flag=0;
if lead5=1 then do;
flag=1;
flag_year=cyear;
end;
if flag=1 and cyear>flag_year then delete;
drop flag_year flag;
run;
Please be kind to us. If you want a solution that works for more than 1 gvkey, from now provide data with more than 1 gvkey.
I think you misunderstand arrays. Arrays are not used to work row by row, or gvkey by gvkey, as your request would require. Arrays work column by column in each row, which is not what you need for this problem.
data want;
set have;
by gvkey;
retain flag;
if first.gvkey then flag = 1;
if flag;
if lead5 then flag = 0;
drop flag;
run;
Thank you so much.
data want;
set have;
retain flag_year flag;
by gvkey;
if first.gvkey then flag=0;
if lead5=1 then do;
flag=1;
flag_year=cyear;
end;
if flag=1 and cyear>flag_year then delete;
drop flag_year flag;
run;
Please be kind to us. If you want a solution that works for more than 1 gvkey, from now provide data with more than 1 gvkey.
I think you misunderstand arrays. Arrays are not used to work row by row, or gvkey by gvkey, as your request would require. Arrays work column by column in each row, which is not what you need for this problem.
Thank you so much. Your code perfectly works.
If you have exactly one lead5=1 year per gvkey, then:
data want (drop=_:);
merge have (where=(lead5=1) rename=(cyear=_cyear5))
have ;
by gvkey;
if cyear<=_cyear5;
run;
If a gvkey has no lead5 years, then the above will drop all observations. But if you actually WANT all observations in the absence of a lead5 record, then change the subsetting IF to
if cyear<=_cyear5 or _cyear5=.;
If you have multiple lead5=1 years, then the above will keep all records up through the last lead5 year (assuming data are sorted by cyear within gvkey).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.