I have a data set like this
ID | vr | bd | fx | gh |
A11 | 1/2/1999 | . | 6/4/2012 | 8/1/2021 |
A12 | 3/5/2021 | . | . | . |
A13 | 3/5/2000 | . | . | 4/17/1988 |
A14 | 10/24/1956 | 12/2/1967 | 1/6/2001 | . |
A15 | 7/1/1996 | 11/7/1990 | . | 7/21/2001 |
I wanted a table like this. Varaibles are selected based dates values- earliest date will fill slit1 and latest date fill slot 4. Missing dates are ignored.
ID | vr | bd | fx | gh | yz | slot1 | slot2 | slot3 | slot4 |
A11 | 1/2/1999 | . | 8/1/2021 | 9/23/2019 | vr | gh | yz | ||
A12 | 3/5/2021 | . | . | . | 3/5/2000 | yz | vr | ||
A13 | 3/5/2000 | . | . | 4/17/1988 | . | gh | vr | ||
A14 | 10/24/1956 | 12/2/1967 | 1/6/2001 | . | . | fx | bd | vr | |
A15 | 7/1/1996 | 11/7/1990 | . | 7/21/2001 | 6/10/2000 | bd | vr | yz | gh |
data have; infile cards expandtabs; input ID $ ( vr bd fx gh) (:mmddyy12.); format vr bd fx gh mmddyy10.; cards; A11 1/2/1999 . 6/4/2012 8/1/2021 A12 3/5/2021 . . . A13 3/5/2000 . . 4/17/1988 A14 10/24/1956 12/2/1967 1/6/2001 . A15 7/1/1996 11/7/1990 . 7/21/2001 ; data want; if _n_=1 then do; length date 8 vname $ 40; declare hash h(multidata:'y',ordered:'y'); declare hiter hi('h'); h.definekey('date'); h.definedata('vname'); h.definedone(); end; set have; array x{*} vr bd fx gh; array y{*} $ 40 slot1-slot4; do i=1 to dim(x); date=x{i};vname=vname(x{i}); if not missing(date) then h.add(); end; do i=1 by 1 while(hi.next()=0); y{i}=vname; end; h.clear(); drop i date vname; run; proc print;run;
If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample
What @Reeza said. Transpose to long, sort by date and assign a number sequence, then transpose back to wide.
@alagiejatta wrote:
If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample
So, @Reeza asked "What are you planning to do with this later on?" I don't see an answer. After you re-arrange the data, what analysis or reporting are you going to do? Knowing this information can really help devise the best method of handling this data.
The long data set as suggested by @Reeza will do that, and should be much easier to program.
@alagiejatta wrote:
So the final product is to show a sequence of event. I will submit a file the shows which event occur first, second, third etc by ID. For A11 the final answer is VR-GH-YZ
So you actually do not need the wide layout, you can use the long dataset after sorting.
Convert it to a long structure as I suggested, possibly it started out that way?
Either way, look at the options here:
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
See here the whole code, from converting to the long layout to a report presenting the data in wide form:
data have;
infile datalines dlm="09"x dsd truncover;
input ID $ (vr bd fx gh) (:mmddyy10.);
format vr bd fx gh yymmdd10.;
datalines;
A11 1/2/1999 . 6/4/2012 8/1/2021
A12 3/5/2021 . . .
A13 3/5/2000 . . 4/17/1988
A14 10/24/1956 12/2/1967 1/6/2001 .
A15 7/1/1996 11/7/1990 . 7/21/2001
;
proc transpose
data=have
out=long (
rename=(col1=date)
where=(date ne .)
)
;
by ID;
var vr--gh;
run;
proc sort data=long;
by id date;
run;
data want;
set long;
by id;
if first.id
then ct = 1;
else ct + 1;
slot = cats("slot",ct);
drop ct;
run;
proc report data=want;
column id _name_,slot n;
define id / group;
define _name_ / "" display;
define n / noprint;
define slot / "" across;
run;
data have; infile cards expandtabs; input ID $ ( vr bd fx gh) (:mmddyy12.); format vr bd fx gh mmddyy10.; cards; A11 1/2/1999 . 6/4/2012 8/1/2021 A12 3/5/2021 . . . A13 3/5/2000 . . 4/17/1988 A14 10/24/1956 12/2/1967 1/6/2001 . A15 7/1/1996 11/7/1990 . 7/21/2001 ; data want; if _n_=1 then do; length date 8 vname $ 40; declare hash h(multidata:'y',ordered:'y'); declare hiter hi('h'); h.definekey('date'); h.definedata('vname'); h.definedone(); end; set have; array x{*} vr bd fx gh; array y{*} $ 40 slot1-slot4; do i=1 to dim(x); date=x{i};vname=vname(x{i}); if not missing(date) then h.add(); end; do i=1 by 1 while(hi.next()=0); y{i}=vname; end; h.clear(); drop i date vname; run; proc print;run;
Hello @alagiejatta,
Is it possible that duplicate dates occur for the same ID as in my sample data below for ID A99?
data have; input ID $ (vr bd fx gh yz)(:mmddyy.); format vr--yz mmddyy10.; cards; A11 1/2/1999 . 6/4/2012 8/1/2021 9/23/2019 A12 3/5/2021 . . . 3/5/2000 A13 3/5/2000 . . 4/17/1988 . A14 10/24/1956 12/2/1967 1/6/2001 . . A15 7/1/1996 11/7/1990 . 7/21/2001 6/10/2000 A99 9/23/2019 3/5/2000 4/17/1988 3/5/2000 1/2/1999 ;
If not, you could populate the "slot" variables like this:
data want;
set have;
array _d[*] vr--yz;
array slot[5] $32;
do _n_=1 to n(of _d[*]);
slot[_n_]=vname(_d[whichn(smallest(_n_, of _d[*]), of _d[*])]);
end;
run;
(Only the name of the first of two or more variables containing the same date would occur in the "slots:" slot3=slot4='bd' for ID A99 in the example above.)
That said, I agree with the others who recommend a "long" data structure.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.