Solved: Re: Select var based on sequence of date

alagiejatta · Posted 10-04-2021 04:36 PM

I have a data set like this

ID	vr	bd	fx	gh
A11	1/2/1999	.	6/4/2012	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990	.	7/21/2001

I wanted a table like this. Varaibles are selected based dates values- earliest date will fill slit1 and latest date fill slot 4. Missing dates are ignored.

ID	vr	bd	fx	gh	yz	slot1	slot2	slot3	slot4
A11	1/2/1999	.		8/1/2021	9/23/2019	vr	gh	yz
A12	3/5/2021	.	.	.	3/5/2000	yz	vr
A13	3/5/2000	.	.	4/17/1988	.	gh	vr
A14	10/24/1956	12/2/1967	1/6/2001	.	.	fx	bd	vr
A15	7/1/1996	11/7/1990	.	7/21/2001	6/10/2000	bd	vr	yz	gh

Ksharp · Posted 10-05-2021 03:12 AM

data have;
infile cards expandtabs;
input ID	$ ( vr              	bd               	 fx                  	gh) (:mmddyy12.); 
format vr bd fx gh mmddyy10.;
cards;                 
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;


data want;
if _n_=1 then do;
 length date 8 vname $ 40;
 declare hash h(multidata:'y',ordered:'y');
 declare hiter hi('h');
 h.definekey('date');
 h.definedata('vname');
 h.definedone();
end;
 set have;
 array x{*}  vr bd  fx gh;
 array y{*} $ 40 slot1-slot4;
 do i=1 to dim(x);
   date=x{i};vname=vname(x{i});
   if not missing(date) then h.add();
 end;
 do i=1 by 1 while(hi.next()=0);
   y{i}=vname;
 end;
 h.clear();
 drop i date vname;
run;

proc print;run;

View solution in original post

Reeza · Posted 10-04-2021 04:41 PM

What happens if someone has a test in all 5 categories?
Do you need exactly 5 or does this need to scale?
What are you planning to do with this later on?

I'm leaning very strongly to suggesting whatever you're doing is likely easier using a long format such as:

ID TEST DATE
A11 VR 1/2/1999
A11 GH 8/1/2021
A12 VR 3/5/2021
A13 VR 3/5/2000
....

alagiejatta · Posted 10-04-2021 04:43 PM

If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample

Kurt_Bremser · Posted 10-04-2021 05:01 PM

What @Reeza said. Transpose to long, sort by date and assign a number sequence, then transpose back to wide.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 10-04-2021 05:02 PM

@alagiejatta wrote:

If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample

So, @Reeza asked "What are you planning to do with this later on?" I don't see an answer. After you re-arrange the data, what analysis or reporting are you going to do? Knowing this information can really help devise the best method of handling this data.

--
Paige Miller

alagiejatta · Posted 10-04-2021 05:21 PM

So the final product is to show a sequence of event. I will submit a file the shows which event occur first, second, third etc by ID. For A11 the final answer is VR-GH-YZ

PaigeMiller · Posted 10-04-2021 05:24 PM

The long data set as suggested by @Reeza will do that, and should be much easier to program.

--
Paige Miller

Kurt_Bremser · Posted 10-04-2021 05:24 PM

@alagiejatta wrote:
So the final product is to show a sequence of event. I will submit a file the shows which event occur first, second, third etc by ID. For A11 the final answer is VR-GH-YZ

So you actually do not need the wide layout, you can use the long dataset after sorting.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 10-04-2021 06:14 PM

Convert it to a long structure as I suggested, possibly it started out that way?

Either way, look at the options here:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

Kurt_Bremser · Posted 10-05-2021 03:08 AM

See here the whole code, from converting to the long layout to a report presenting the data in wide form:

data have;
infile datalines dlm="09"x dsd truncover;
input ID $ (vr bd fx gh) (:mmddyy10.);
format vr bd fx gh yymmdd10.;
datalines;
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;

proc transpose
  data=have
  out=long (
   rename=(col1=date)
   where=(date ne .)
  )
;
by ID;
var vr--gh;
run;

proc sort data=long;
by id date;
run;

data want;
set long;
by id;
if first.id
then ct = 1;
else ct + 1;
slot = cats("slot",ct);
drop ct;
run;

proc report data=want;
column id _name_,slot n;
define id / group;
define _name_ / "" display;
define n / noprint;
define slot / "" across;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 10-05-2021 03:12 AM

data have;
infile cards expandtabs;
input ID	$ ( vr              	bd               	 fx                  	gh) (:mmddyy12.); 
format vr bd fx gh mmddyy10.;
cards;                 
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;


data want;
if _n_=1 then do;
 length date 8 vname $ 40;
 declare hash h(multidata:'y',ordered:'y');
 declare hiter hi('h');
 h.definekey('date');
 h.definedata('vname');
 h.definedone();
end;
 set have;
 array x{*}  vr bd  fx gh;
 array y{*} $ 40 slot1-slot4;
 do i=1 to dim(x);
   date=x{i};vname=vname(x{i});
   if not missing(date) then h.add();
 end;
 do i=1 by 1 while(hi.next()=0);
   y{i}=vname;
 end;
 h.clear();
 drop i date vname;
run;

proc print;run;

FreelanceReinh · Posted 10-05-2021 03:50 AM

Hello @alagiejatta,

Is it possible that duplicate dates occur for the same ID as in my sample data below for ID A99?

data have;
input ID $ (vr bd fx gh yz)(:mmddyy.);
format vr--yz mmddyy10.;
cards;
A11 1/2/1999 . 6/4/2012 8/1/2021 9/23/2019
A12 3/5/2021 . . . 3/5/2000
A13 3/5/2000 . . 4/17/1988 .
A14 10/24/1956 12/2/1967 1/6/2001 . .
A15 7/1/1996 11/7/1990 . 7/21/2001 6/10/2000
A99 9/23/2019 3/5/2000 4/17/1988 3/5/2000 1/2/1999
;

If not, you could populate the "slot" variables like this:

data want;
set have;
array _d[*] vr--yz;
array slot[5] $32;
do _n_=1 to n(of _d[*]);
  slot[_n_]=vname(_d[whichn(smallest(_n_, of _d[*]), of _d[*])]);
end;
run;

(Only the name of the first of two or more variables containing the same date would occur in the "slots:" slot3=slot4='bd' for ID A99 in the example above.)

That said, I agree with the others who recommend a "long" data structure.

Registration is open

SAS Training: Just a Click Away