Hi,
I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).
ID categ_1 categ_2 date
1 a 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
Basically, my logic for which record to keep for each person is this:
1. If any of their values of categ_1=a, that record should be kept
2. If none of their values of categ_1=a, then the record with the most recent date should be kept
So all I did was sort by categ_1 and date (descending), then output the top record.
The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.
Any help is much appreciated.
How about something like this:
/* order your input set */
proc sql;
create table part_output as
select *
from input
order by id, categ_1, categ_2, date desc;
/* select records using by processing and a retained flag (twrot) */
data myoutput;
set part_output;
retain twrot;
by id;
if first.id and categ_1 = 'a' then do;
twrot = 0;
output;
end;
else if twrot = 1 then do;
twrot = 0;
output;
end;
if last.id then do;
twrot = 1;
end;
run;
UPDATE statement is useful for this type of problem.
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
proc sql ;
create view middle as
select *,(categ_1 = 'a') as isa
from have
order by id, isa , date
;
quit;
data want ;
update middle (obs=0) middle ;
by id ;
drop isa ;
run;
proc print; run;
Obs ID categ_1 categ_2 date
1 1 a x 2015-01-02
2 2 c y 2013-10-09
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
do until(last.id);
set have;
by id;
if not missing(categ_2) then _categ_2=categ_2;
if not found then do;
_categ_1=categ_1;
_date=date;
end;
if categ_1='a' then do;
_categ_1 =categ_1 ;
_date=date;
found=1;
end;
end;
keep id _:;
format _date mmddyy10.;
run;
Hello All,
I have file ( attached) .I have data like in first table and I would like to have data looks like second table. If anybody help me code for this I really appreciate it.
Thanks
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.