Identifying first value across consecutive rows

Frequent Contributor
Posts: 138

Identifying first value across consecutive rows

Hi,

I have data at the person-month level, so each row=1 person in one month. It has ID, month/year, and two categorical variables. What I want is to collapse to the person-year level as follows:

1. I want the value of categ_var1 as of the earliest month that each person has a value (not missing) for categ_var1.

2. If the person ever has categ_var2='abc', I want an indicator saying that they had categ_var2='abc' in that year.

The data are like this:

ID   mth_year  categ_var1   categ_var2

1     201301            ' '                abc

1     201302            ' '                ghi

1     201303           def              ghi

2     201301           xyz              jkl

2     201302           def              jkl

So the yearly dataset would be like this:

ID   first_categ_var1  categ_var2_abc_ind

1           def                              1

2           xyz                              0

Any help is much appreciated.

Valued Guide
Posts: 533

Re: Identifying first value across consecutive rows

Hi @Walternate,

I tried SQL but reconsidered that old faithfull the datastep is well suited for this. Have a go at this one:

``````data have;
input id mnth_year @10 categ_var1 \$3. @14 categ_var2 \$3.;
cards;
1 201301     abc
1 201302     ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;

proc sort data=have;
by id mnth_year;
run;

data want;
length first_categ_var1 \$3 categ_var2_abc_ind \$1;
keep id first_categ_var1 categ_var2_abc_ind;
set have;
by id mnth_year;
retain first_categ_var1 categ_var2_abc_ind;
if first.id then do;
first_categ_var1='';
categ_var2_abc_ind = '0';
end;
if first_categ_var1='' and categ_var1 ne '' then first_categ_var1=categ_var1;
if categ_var2 = 'abc' then categ_var2_abc_ind = '1';
if last.id and first_categ_var1 ne '' then output;
put _all_;
run;
``````

Hope this helps,

- Jan.

Learner
Posts: 1

Re: Identifying first value across consecutive rows

data a;

input id mth_year categ_var1 \$ categ_var2 \$;

cards;

1 201301 . abc

1 201302 . ghi

1 201303 def ghi

2 201301 xyz jkl

2 201302 def jkl

;

run;

proc sql;

create table b as select min(mth_year) as yr, * from a where categ_var1 is not null group by id having calculated yr = mth_year;

quit;

proc sql;

create table c as select distinct id, categ_var2 from a where categ_var2 = 'abc';

quit;

proc sql;

create table d as select b.id, b.categ_var1, case when c.id is null then 0 else 1 end as categ_var2_id from b left join c on b.id = c.id ;

quit;

Super User
Posts: 10,770

Re: Identifying first value across consecutive rows

```
data have;
input id mnth_year @10 categ_var1 \$3. @14 categ_var2 \$3.;
cards;
1 201301     abc
1 201302     ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
run;
data want;
categ_var2_abc_ind=0;
do until(last.id);
set have;
by id;
first_categ_var1=categ_var1;
found=1;
end;
if categ_var2='abc' then categ_var2_abc_ind=1;
end;

drop categ_var1 categ_var2 found;
run;

```
Discussion stats
• 3 replies
• 268 views
• 0 likes
• 4 in conversation