DATA Step, Macro, Functions and more

Identifying first value across consecutive rows

Reply
Frequent Contributor
Posts: 138

Identifying first value across consecutive rows

Hi,

 

I have data at the person-month level, so each row=1 person in one month. It has ID, month/year, and two categorical variables. What I want is to collapse to the person-year level as follows:

  1. I want the value of categ_var1 as of the earliest month that each person has a value (not missing) for categ_var1. 

  2. If the person ever has categ_var2='abc', I want an indicator saying that they had categ_var2='abc' in that year.

 

The data are like this:

ID   mth_year  categ_var1   categ_var2

1     201301            ' '                abc

1     201302            ' '                ghi

1     201303           def              ghi

2     201301           xyz              jkl    

2     201302           def              jkl

 

So the yearly dataset would be like this:

ID   first_categ_var1  categ_var2_abc_ind

1           def                              1

2           xyz                              0

 

Any help is much appreciated.

 

 

 

Super Contributor
Posts: 408

Re: Identifying first value across consecutive rows

Hi @Walternate,

 

I tried SQL but reconsidered that old faithfull the datastep is well suited for this. Have a go at this one:

 

data have;
input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.;
cards;
1 201301     abc
1 201302     ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;

proc sort data=have;
   by id mnth_year;
run;

data want;
   length first_categ_var1 $3 categ_var2_abc_ind $1;
   keep id first_categ_var1 categ_var2_abc_ind;
   set have;
   by id mnth_year;
   retain first_categ_var1 categ_var2_abc_ind;
   if first.id then do;
      first_categ_var1='';
      categ_var2_abc_ind = '0';
   end;
   if first_categ_var1='' and categ_var1 ne '' then first_categ_var1=categ_var1;
   if categ_var2 = 'abc' then categ_var2_abc_ind = '1';
   if last.id and first_categ_var1 ne '' then output;
   put _all_;
run;

Hope this helps,

- Jan.

Learner
Posts: 1

Re: Identifying first value across consecutive rows

data a;

input id mth_year categ_var1 $ categ_var2 $;

cards;

1 201301 . abc

1 201302 . ghi

1 201303 def ghi

2 201301 xyz jkl

2 201302 def jkl

;

run;

proc sql;

create table b as select min(mth_year) as yr, * from a where categ_var1 is not null group by id having calculated yr = mth_year;

quit;

proc sql;

create table c as select distinct id, categ_var2 from a where categ_var2 = 'abc';

quit;

proc sql;

create table d as select b.id, b.categ_var1, case when c.id is null then 0 else 1 end as categ_var2_id from b left join c on b.id = c.id ;

quit;

Super User
Posts: 9,681

Re: Identifying first value across consecutive rows


data have;
input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.;
cards;
1 201301     abc
1 201302     ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
run;
data want;
categ_var2_abc_ind=0;
do until(last.id);
 set have;
 by id;
 if not missing(categ_var1) and not found then do;
  first_categ_var1=categ_var1; 
  found=1;
 end;
 if categ_var2='abc' then categ_var2_abc_ind=1;
end;

drop categ_var1 categ_var2 found;
run;

Ask a Question
Discussion stats
  • 3 replies
  • 235 views
  • 0 likes
  • 4 in conversation