Hi,
I have data at the person-month level, so each row=1 person in one month. It has ID, month/year, and two categorical variables. What I want is to collapse to the person-year level as follows:
1. I want the value of categ_var1 as of the earliest month that each person has a value (not missing) for categ_var1.
2. If the person ever has categ_var2='abc', I want an indicator saying that they had categ_var2='abc' in that year.
The data are like this:
ID mth_year categ_var1 categ_var2
1 201301 ' ' abc
1 201302 ' ' ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
So the yearly dataset would be like this:
ID first_categ_var1 categ_var2_abc_ind
1 def 1
2 xyz 0
Any help is much appreciated.
Hi @Walternate,
I tried SQL but reconsidered that old faithfull the datastep is well suited for this. Have a go at this one:
data have;
input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.;
cards;
1 201301 abc
1 201302 ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
proc sort data=have;
by id mnth_year;
run;
data want;
length first_categ_var1 $3 categ_var2_abc_ind $1;
keep id first_categ_var1 categ_var2_abc_ind;
set have;
by id mnth_year;
retain first_categ_var1 categ_var2_abc_ind;
if first.id then do;
first_categ_var1='';
categ_var2_abc_ind = '0';
end;
if first_categ_var1='' and categ_var1 ne '' then first_categ_var1=categ_var1;
if categ_var2 = 'abc' then categ_var2_abc_ind = '1';
if last.id and first_categ_var1 ne '' then output;
put _all_;
run;
Hope this helps,
- Jan.
data a;
input id mth_year categ_var1 $ categ_var2 $;
cards;
1 201301 . abc
1 201302 . ghi
1 201303 def ghi
2 201301 xyz jkl
2 201302 def jkl
;
run;
proc sql;
create table b as select min(mth_year) as yr, * from a where categ_var1 is not null group by id having calculated yr = mth_year;
quit;
proc sql;
create table c as select distinct id, categ_var2 from a where categ_var2 = 'abc';
quit;
proc sql;
create table d as select b.id, b.categ_var1, case when c.id is null then 0 else 1 end as categ_var2_id from b left join c on b.id = c.id ;
quit;
data have; input id mnth_year @10 categ_var1 $3. @14 categ_var2 $3.; cards; 1 201301 abc 1 201302 ghi 1 201303 def ghi 2 201301 xyz jkl 2 201302 def jkl ; run; data want; categ_var2_abc_ind=0; do until(last.id); set have; by id; if not missing(categ_var1) and not found then do; first_categ_var1=categ_var1; found=1; end; if categ_var2='abc' then categ_var2_abc_ind=1; end; drop categ_var1 categ_var2 found; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.