Hi,
I have data at the person-month level, so each row=1 person in one month. It has ID, month/year, and two categorical variables. What I want is to collapse to the person-year level as follows:
1. I want the value of categ_var1 as of the earliest month that each person has a value (not missing) for categ_var1.
2. If the person ever has categ_var2='abc', I want an indicator saying that they had categ_var2='abc' in that year.
The data are like this:
ID mth_year categ_var1 categ_var2
1 201301 ' ' abc
1 201302 ' ' ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
So the yearly dataset would be like this:
ID first_categ_var1 categ_var2_abc_ind
1 def 1
2 xyz 0
Any help is much appreciated.
Hi @Walternate,
I tried SQL but reconsidered that old faithfull the datastep is well suited for this. Have a go at this one:
data have;
input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.;
cards;
1 201301 abc
1 201302 ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
proc sort data=have;
by id mnth_year;
run;
data want;
length first_categ_var1 $3 categ_var2_abc_ind $1;
keep id first_categ_var1 categ_var2_abc_ind;
set have;
by id mnth_year;
retain first_categ_var1 categ_var2_abc_ind;
if first.id then do;
first_categ_var1='';
categ_var2_abc_ind = '0';
end;
if first_categ_var1='' and categ_var1 ne '' then first_categ_var1=categ_var1;
if categ_var2 = 'abc' then categ_var2_abc_ind = '1';
if last.id and first_categ_var1 ne '' then output;
put _all_;
run;
Hope this helps,
- Jan.
data a;
input id mth_year categ_var1 $ categ_var2 $;
cards;
1 201301 . abc
1 201302 . ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
run;
proc sql;
create table b as select min(mth_year) as yr, * from a where categ_var1 is not null group by id having calculated yr = mth_year;
quit;
proc sql;
create table c as select distinct id, categ_var2 from a where categ_var2 = 'abc';
quit;
proc sql;
create table d as select b.id, b.categ_var1, case when c.id is null then 0 else 1 end as categ_var2_id from b left join c on b.id = c.id ;
quit;
data have; input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.; cards; 1 201301 abc 1 201302 ghi 1 201303 def ghi 2 201301 xyz jkl 2 201302 def jkl ; run; data want; categ_var2_abc_ind=0; do until(last.id); set have; by id; if not missing(categ_var1) and not found then do; first_categ_var1=categ_var1; found=1; end; if categ_var2='abc' then categ_var2_abc_ind=1; end; drop categ_var1 categ_var2 found; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.