Hi,
I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).
ID categ_1 categ_2 date
1 a 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
Basically, my logic for which record to keep for each person is this:
1. If any of their values of categ_1=a, that record should be kept
2. If none of their values of categ_1=a, then the record with the most recent date should be kept
So all I did was sort by categ_1 and date (descending), then output the top record.
The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.
Any help is much appreciated.
How about something like this:
/* order your input set */
proc sql;
create table part_output as
select *
from input
order by id, categ_1, categ_2, date desc;
/* select records using by processing and a retained flag (twrot) */
data myoutput;
set part_output;
retain twrot;
by id;
if first.id and categ_1 = 'a' then do;
twrot = 0;
output;
end;
else if twrot = 1 then do;
twrot = 0;
output;
end;
if last.id then do;
twrot = 1;
end;
run;
UPDATE statement is useful for this type of problem.
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
proc sql ;
create view middle as
select *,(categ_1 = 'a') as isa
from have
order by id, isa , date
;
quit;
data want ;
update middle (obs=0) middle ;
by id ;
drop isa ;
run;
proc print; run;
Obs ID categ_1 categ_2 date
1 1 a x 2015-01-02
2 2 c y 2013-10-09
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
do until(last.id);
set have;
by id;
if not missing(categ_2) then _categ_2=categ_2;
if not found then do;
_categ_1=categ_1;
_date=date;
end;
if categ_1='a' then do;
_categ_1 =categ_1 ;
_date=date;
found=1;
end;
end;
keep id _:;
format _date mmddyy10.;
run;
Hello All,
I have file ( attached) .I have data like in first table and I would like to have data looks like second table. If anybody help me code for this I really appreciate it.
Thanks
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.