My data is the panel data.
year | gvkey | id |
2010 | 1000 | 1 |
2011 | 1000 | 1 |
2011 | 2000 | 1 |
2011 | 1000 | 2 |
2014 | 3000 | 2 |
2017 | 3000 | 2 |
2014 | 2000 | 3 |
2014 | 4000 | 3 |
2016 | 2000 | 3 |
I want to find whether the gvkeys of an id of the current period exist in the prior period, regardless of whether there is a discontinuity in the years (i.e., the id 2 has a three-year gap between 2014 and 2017). The panel data is fine as long as the years are in order by grouping the id.
year | gvkey | id | indicator |
2010 | 1000 | 1 | 0 |
2011 | 1000 | 1 | 1 |
2011 | 2000 | 1 | 0 |
2011 | 1000 | 2 | 0 |
2014 | 3000 | 2 | 0 |
2017 | 3000 | 2 | 1 |
2014 | 2000 | 3 | 0 |
2014 | 4000 | 3 | 0 |
2016 | 2000 | 3 | 1 |
Hence, my intended outcome should look like the column indicator where 1 equals presence and 0 equals absence in the prior period.
Thank you so much for your assistance.
Try this
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
array r {999} _temporary_;
indicator = gvkey in r;
r[_N_] = gvkey;
output;
end;
call missing(of r[*]);
run;
Result:
year gvkey id indicator 2010 1000 1 0 2011 1000 1 1 2011 2000 1 0 2011 1000 2 0 2014 3000 2 0 2017 3000 2 1 2014 2000 3 0 2014 4000 3 0 2016 2000 3 1
Try this
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
array r {999} _temporary_;
indicator = gvkey in r;
r[_N_] = gvkey;
output;
end;
call missing(of r[*]);
run;
Result:
year gvkey id indicator 2010 1000 1 0 2011 1000 1 1 2011 2000 1 0 2011 1000 2 0 2014 3000 2 0 2017 3000 2 1 2014 2000 3 0 2014 4000 3 0 2016 2000 3 1
Assuming the data are sorted by ID/YEAR, then you can just keep a running list of GVKEYs encountered for each ID:
data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
set have;
by id ;
array _gvkeys {20} _temporary_;
if first.id then call missing(of _gvkeys{*});
indicator=whichn(gvkey,of _gvkeys{*})>0;
if indicator=0 then _gvkeys{n(of _gvkeys{*})+1}=gvkey;
run;
This program assumes no ID has more the 20 observations. If you expect more then increase the size of the array _GVKEYS.
Should be simple in PROC SQL.
First let's convert your listing into a dataset (why did you list the variables in backwards order?).
data have;
length id year gvkey 8;
input year gvkey id;
cards;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
Now just make the indicator to flag the values of GVKEY that occur after the first time it occurs.
proc sql;
create table want as
select *,(year > min(year)) as indicator
from have
group id,gvkey
order by id,year,gvkey
;
quit;
Results
Obs id year gvkey indicator 1 1 2010 1000 0 2 1 2011 1000 1 3 1 2011 2000 0 4 2 2011 1000 0 5 2 2014 3000 0 6 2 2017 3000 1 7 3 2014 2000 0 8 3 2014 4000 0 9 3 2016 2000 1
Should be more efficient in Hash Table and keep original order.
data have;
length id year gvkey 8;
input year gvkey id;
cards;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
proc sql noprint;
select min(year) into :year from have;
quit;
data want;
if _n_=1 then do;
declare hash h(dataset:'have');
h.definekey('year','gvkey','id');
h.definedone();
end;
set have;
indicator=0;
do year1=&year. to year-1;
if h.check(key:year1,key:gvkey,key:id)=0 then do;indicator=1;leave;end;
end;
drop year1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.