Solved: Find the observations in the current period exist in the previous peri...

Jarvin99 · Posted 04-04-2022 01:56 PM

My data is the panel data.

year	gvkey	id
2010	1000	1
2011	1000	1
2011	2000	1
2011	1000	2
2014	3000	2
2017	3000	2
2014	2000	3
2014	4000	3
2016	2000	3

I want to find whether the gvkeys of an id of the current period exist in the prior period, regardless of whether there is a discontinuity in the years (i.e., the id 2 has a three-year gap between 2014 and 2017). The panel data is fine as long as the years are in order by grouping the id.

year	gvkey	id	indicator
2010	1000	1	0
2011	1000	1	1
2011	2000	1	0
2011	1000	2	0
2014	3000	2	0
2017	3000	2	1
2014	2000	3	0
2014	4000	3	0
2016	2000	3	1

Hence, my intended outcome should look like the column indicator where 1 equals presence and 0 equals absence in the prior period.

Thank you so much for your assistance.

PeterClemmensen · Posted 04-04-2022 03:08 PM

Try this

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
	  by id;
      array r {999} _temporary_;
	  indicator = gvkey in r;
	  r[_N_] = gvkey;
	  output;
   end;

   call missing(of r[*]);
run;

Result:

year  gvkey  id  indicator 
2010  1000   1   0 
2011  1000   1   1 
2011  2000   1   0 
2011  1000   2   0 
2014  3000   2   0 
2017  3000   2   1 
2014  2000   3   0 
2014  4000   3   0 
2016  2000   3   1

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 04-04-2022 03:08 PM

Try this

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
	  by id;
      array r {999} _temporary_;
	  indicator = gvkey in r;
	  r[_N_] = gvkey;
	  output;
   end;

   call missing(of r[*]);
run;

Result:

year  gvkey  id  indicator 
2010  1000   1   0 
2011  1000   1   1 
2011  2000   1   0 
2011  1000   2   0 
2014  3000   2   0 
2017  3000   2   1 
2014  2000   3   0 
2014  4000   3   0 
2016  2000   3   1

The DATA to DATA Step Macro
Blog: SASnrd

mkeintz · Posted 04-04-2022 06:29 PM

Assuming the data are sorted by ID/YEAR, then you can just keep a running list of GVKEYs encountered for each ID:

data have;
input year gvkey id;
datalines;
2010 1000 1
2011 1000 1
2011 2000 1
2011 1000 2
2014 3000 2
2017 3000 2
2014 2000 3
2014 4000 3
2016 2000 3
;
data want;
  set have;
  by id ;
  array _gvkeys {20} _temporary_;
  if first.id then call missing(of _gvkeys{*});
  indicator=whichn(gvkey,of _gvkeys{*})>0;
  if indicator=0 then _gvkeys{n(of _gvkeys{*})+1}=gvkey;
run;

This program assumes no ID has more the 20 observations. If you expect more then increase the size of the array _GVKEYS.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 04-05-2022 12:25 AM

Should be simple in PROC SQL.

First let's convert your listing into a dataset (why did you list the variables in backwards order?).

data have;
  length id year gvkey 8;
  input year gvkey id;
cards;
2010	1000	1
2011	1000	1
2011	2000	1
2011	1000	2
2014	3000	2
2017	3000	2
2014	2000	3
2014	4000	3
2016	2000	3
;

Now just make the indicator to flag the values of GVKEY that occur after the first time it occurs.

proc sql;
create table want as
  select *,(year > min(year)) as indicator
  from have
  group id,gvkey
  order by id,year,gvkey
;
quit;

Results

Obs    id    year    gvkey    indicator

 1      1    2010     1000        0
 2      1    2011     1000        1
 3      1    2011     2000        0
 4      2    2011     1000        0
 5      2    2014     3000        0
 6      2    2017     3000        1
 7      3    2014     2000        0
 8      3    2014     4000        0
 9      3    2016     2000        1

Ksharp · Posted 04-05-2022 06:44 AM

Should be more efficient in Hash Table and keep original order.

data have;
  length id year gvkey 8;
  input year gvkey id;
cards;
2010	1000	1
2011	1000	1
2011	2000	1
2011	1000	2
2014	3000	2
2017	3000	2
2014	2000	3
2014	4000	3
2016	2000	3
;
proc sql noprint;
select min(year) into :year from have;
quit;
data want;
 if _n_=1 then do;
  declare hash h(dataset:'have');
  h.definekey('year','gvkey','id');
  h.definedone();
 end;
set have; 
indicator=0;
do year1=&year. to year-1; 
 if h.check(key:year1,key:gvkey,key:id)=0 then do;indicator=1;leave;end;
end;
drop year1;
run;

Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Re: Find the observations in the current period exist in the previous period

Click image to register for webinar

Classroom Training Available!