My data is the panel data.
year | gvkey | id |
2010 | 1000 | 1 |
2011 | 1000 | 1 |
2011 | 2000 | 1 |
2011 | 1000 | 2 |
2014 | 3000 | 2 |
2017 | 3000 | 2 |
2014 | 2000 | 3 |
2014 | 4000 | 3 |
2016 | 2000 | 3 |
I want to find whether the gvkeys of an id of the current period exist in the prior period, regardless of whether there is a discontinuity in the years (i.e., the id 2 has a three-year gap between 2014 and 2017). The panel data is fine as long as the years are in order by grouping the id.
year | gvkey | id | indicator |
2010 | 1000 | 1 | 0 |
2011 | 1000 | 1 | 1 |
2011 | 2000 | 1 | 0 |
2011 | 1000 | 2 | 0 |
2014 | 3000 | 2 | 0 |
2017 | 3000 | 2 | 1 |
2014 | 2000 | 3 | 0 |
2014 | 4000 | 3 | 0 |
2016 | 2000 | 3 | 1 |
Hence, my intended outcome should look like the column indicator where 1 equals presence and 0 equals absence in the prior period.
Thank you so much for your assistance.
Try this
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
array r {999} _temporary_;
indicator = gvkey in r;
r[_N_] = gvkey;
output;
end;
call missing(of r[*]);
run;
Result:
year gvkey id indicator 2010 1000 1 0 2011 1000 1 1 2011 2000 1 0 2011 1000 2 0 2014 3000 2 0 2017 3000 2 1 2014 2000 3 0 2014 4000 3 0 2016 2000 3 1
Try this
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
array r {999} _temporary_;
indicator = gvkey in r;
r[_N_] = gvkey;
output;
end;
call missing(of r[*]);
run;
Result:
year gvkey id indicator 2010 1000 1 0 2011 1000 1 1 2011 2000 1 0 2011 1000 2 0 2014 3000 2 0 2017 3000 2 1 2014 2000 3 0 2014 4000 3 0 2016 2000 3 1
Assuming the data are sorted by ID/YEAR, then you can just keep a running list of GVKEYs encountered for each ID:
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
set have;
by id ;
array _gvkeys {20} _temporary_;
if first.id then call missing(of _gvkeys{*});
indicator=whichn(gvkey,of _gvkeys{*})>0;
if indicator=0 then _gvkeys{n(of _gvkeys{*})+1}=gvkey;
run;
This program assumes no ID has more the 20 observations. If you expect more then increase the size of the array _GVKEYS.
Should be simple in PROC SQL.
First let's convert your listing into a dataset (why did you list the variables in backwards order?).
data have;
length id year gvkey 8;
input year gvkey id;
cards;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
Now just make the indicator to flag the values of GVKEY that occur after the first time it occurs.
proc sql;
create table want as
select *,(year > min(year)) as indicator
from have
group id,gvkey
order by id,year,gvkey
;
quit;
Results
Obs id year gvkey indicator 1 1 2010 1000 0 2 1 2011 1000 1 3 1 2011 2000 0 4 2 2011 1000 0 5 2 2014 3000 0 6 2 2017 3000 1 7 3 2014 2000 0 8 3 2014 4000 0 9 3 2016 2000 1
Should be more efficient in Hash Table and keep original order.
data have;
length id year gvkey 8;
input year gvkey id;
cards;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
proc sql noprint;
select min(year) into :year from have;
quit;
data want;
if _n_=1 then do;
declare hash h(dataset:'have');
h.definekey('year','gvkey','id');
h.definedone();
end;
set have;
indicator=0;
do year1=&year. to year-1;
if h.check(key:year1,key:gvkey,key:id)=0 then do;indicator=1;leave;end;
end;
drop year1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.