Hi,
I have a dataset with multiple rows per person, which I am reducing to one row per person based on values of other variables. There's an ID variable, two categorical variables, and a date variable. Categ_1 can differ across the same ID, but categ_2 will always have the same value for the same ID (or more accurately, it can be missing, but there won't be one person who has two separate non-missing values for this variable).
ID categ_1 categ_2 date
1 a 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
Basically, my logic for which record to keep for each person is this:
1. If any of their values of categ_1=a, that record should be kept
2. If none of their values of categ_1=a, then the record with the most recent date should be kept
So all I did was sort by categ_1 and date (descending), then output the top record.
The problem is that none of the records where categ_1=a has a value for categ_2. Since categ_2 will always be the same within each ID number, I want to add a step where I fill categ_2 using other row(s) of that ID number before outputting. The way I have it now, the top record for ID1 will output, which is right, but I want it to have categ_2=x before it outputs to the one person/one row dataset.
Any help is much appreciated.
How about something like this:
/* order your input set */
proc sql;
create table part_output as
select *
from input
order by id, categ_1, categ_2, date desc;
/* select records using by processing and a retained flag (twrot) */
data myoutput;
set part_output;
retain twrot;
by id;
if first.id and categ_1 = 'a' then do;
twrot = 0;
output;
end;
else if twrot = 1 then do;
twrot = 0;
output;
end;
if last.id then do;
twrot = 1;
end;
run;
UPDATE statement is useful for this type of problem.
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
proc sql ;
create view middle as
select *,(categ_1 = 'a') as isa
from have
order by id, isa , date
;
quit;
data want ;
update middle (obs=0) middle ;
by id ;
drop isa ;
run;
proc print; run;
Obs ID categ_1 categ_2 date
1 1 a x 2015-01-02
2 2 c y 2013-10-09
data have ;
input ID categ_1 $ categ_2 $ date :mmddyy10. ;
format date yymmdd10.;
cards;
1 a . 1/2/2015
1 b x 2/14/2015
2 c y 10/9/2013
2 c y 5/4/2012
;;;;
run;
proc sort data=have;by id date;run;
data want;
do until(last.id);
set have;
by id;
if not missing(categ_2) then _categ_2=categ_2;
if not found then do;
_categ_1=categ_1;
_date=date;
end;
if categ_1='a' then do;
_categ_1 =categ_1 ;
_date=date;
found=1;
end;
end;
keep id _:;
format _date mmddyy10.;
run;
Hello All,
I have file ( attached) .I have data like in first table and I would like to have data looks like second table. If anybody help me code for this I really appreciate it.
Thanks
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.