BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jarvin99
Obsidian | Level 7

My data is the panel data.

yeargvkeyid
201010001
201110001
201120001
201110002
201430002
201730002
201420003
201440003
201620003

 

I want to find whether the gvkeys of an id of the current period exist in the prior period, regardless of whether there is a discontinuity in the years (i.e., the id 2 has a three-year gap between 2014 and 2017). The panel data is fine as long as the years are in order by grouping the id.

 

yeargvkeyidindicator
2010100010
2011100011
2011200010
2011100020
2014300020
2017300021
2014200030
2014400030
2016200031

Hence, my intended outcome should look like the column indicator where 1 equals presence and 0 equals absence in the prior period.

 

Thank you so much for your assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
	  by id;
      array r {999} _temporary_;
	  indicator = gvkey in r;
	  r[_N_] = gvkey;
	  output;
   end;

   call missing(of r[*]);
run;

 

Result:

 

year  gvkey  id  indicator 
2010  1000   1   0 
2011  1000   1   1 
2011  2000   1   0 
2011  1000   2   0 
2014  3000   2   0 
2017  3000   2   1 
2014  2000   3   0 
2014  4000   3   0 
2016  2000   3   1 

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
	  by id;
      array r {999} _temporary_;
	  indicator = gvkey in r;
	  r[_N_] = gvkey;
	  output;
   end;

   call missing(of r[*]);
run;

 

Result:

 

year  gvkey  id  indicator 
2010  1000   1   0 
2011  1000   1   1 
2011  2000   1   0 
2011  1000   2   0 
2014  3000   2   0 
2017  3000   2   1 
2014  2000   3   0 
2014  4000   3   0 
2016  2000   3   1 
mkeintz
PROC Star

Assuming the data are sorted by ID/YEAR, then you can just keep a running list of GVKEYs encountered for each ID:

 

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
  set have;
  by id ;
  array _gvkeys {20} _temporary_;
  if first.id then call missing(of _gvkeys{*});
  indicator=whichn(gvkey,of _gvkeys{*})>0;
  if indicator=0 then _gvkeys{n(of _gvkeys{*})+1}=gvkey;
run;

This program assumes no ID has more the 20 observations.  If you expect more then increase the size of the array _GVKEYS.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Should be simple in PROC SQL.

First let's convert your listing into a dataset (why did you list the variables in backwards order?).

data have;
  length id year gvkey 8;
  input year gvkey id;
cards;
2010	1000	1
2011	1000	1
2011	2000	1
2011	1000	2
2014	3000	2
2017	3000	2
2014	2000	3
2014	4000	3
2016	2000	3
;

Now just make the indicator to flag the values of GVKEY that occur after the first time it occurs.

proc sql;
create table want as
  select *,(year > min(year)) as indicator
  from have
  group id,gvkey
  order by id,year,gvkey
;
quit;

Results

Obs    id    year    gvkey    indicator

 1      1    2010     1000        0
 2      1    2011     1000        1
 3      1    2011     2000        0
 4      2    2011     1000        0
 5      2    2014     3000        0
 6      2    2017     3000        1
 7      3    2014     2000        0
 8      3    2014     4000        0
 9      3    2016     2000        1
Ksharp
Super User

Should be more efficient in Hash Table and keep original order.

 

data have;
  length id year gvkey 8;
  input year gvkey id;
cards;
2010	1000	1
2011	1000	1
2011	2000	1
2011	1000	2
2014	3000	2
2017	3000	2
2014	2000	3
2014	4000	3
2016	2000	3
;
proc sql noprint;
select min(year) into :year from have;
quit;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'have');
  h.definekey('year','gvkey','id');
  h.definedone();
 end;
set have; 
indicator=0;
do year1=&year. to year-1; 
 if h.check(key:year1,key:gvkey,key:id)=0 then do;indicator=1;leave;end;
end;
drop year1;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 727 views
  • 4 likes
  • 5 in conversation