BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
alagiejatta
Calcite | Level 5

I have a data set like this

ID  vr              bd                fx                  gh                  
A111/2/1999. 6/4/2012       8/1/2021
A123/5/2021...
A133/5/2000..4/17/1988
A1410/24/195612/2/19671/6/2001.
A157/1/199611/7/1990   .7/21/2001

 

I wanted a table like this. Varaibles are selected based dates values- earliest date will fill slit1 and latest date fill slot 4. Missing dates are ignored.

ID  vr              bd                fx                  gh                  yzslot1slot2slot3slot4
A111/2/1999. 8/1/20219/23/2019vrghyz 
A123/5/2021...3/5/2000yzvr  
A133/5/2000..4/17/1988.ghvr  
A1410/24/195612/2/19671/6/2001..fxbdvr 
A157/1/199611/7/1990   .7/21/20016/10/2000bdvryzgh

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile cards expandtabs;
input ID	$ ( vr              	bd               	 fx                  	gh) (:mmddyy12.); 
format vr bd fx gh mmddyy10.;
cards;                 
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;


data want;
if _n_=1 then do;
 length date 8 vname $ 40;
 declare hash h(multidata:'y',ordered:'y');
 declare hiter hi('h');
 h.definekey('date');
 h.definedata('vname');
 h.definedone();
end;
 set have;
 array x{*}  vr bd  fx gh;
 array y{*} $ 40 slot1-slot4;
 do i=1 to dim(x);
   date=x{i};vname=vname(x{i});
   if not missing(date) then h.add();
 end;
 do i=1 by 1 while(hi.next()=0);
   y{i}=vname;
 end;
 h.clear();
 drop i date vname;
run;

proc print;run;

View solution in original post

11 REPLIES 11
Reeza
Super User
What happens if someone has a test in all 5 categories?
Do you need exactly 5 or does this need to scale?
What are you planning to do with this later on?

I'm leaning very strongly to suggesting whatever you're doing is likely easier using a long format such as:

ID TEST DATE
A11 VR 1/2/1999
A11 GH 8/1/2021
A12 VR 3/5/2021
A13 VR 3/5/2000
....
alagiejatta
Calcite | Level 5

If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample

PaigeMiller
Diamond | Level 26

@alagiejatta wrote:

If someone has test in all five then all the slots will be filled. I am sorry I forget to include slot5 in the output sample


So, @Reeza asked "What are you planning to do with this later on?" I don't see an answer. After you re-arrange the data, what analysis or reporting are you going to do? Knowing this information can really help devise the best method of handling this data.

--
Paige Miller
alagiejatta
Calcite | Level 5
So the final product is to show a sequence of event. I will submit a file the shows which event occur first, second, third etc by ID. For A11 the final answer is VR-GH-YZ
PaigeMiller
Diamond | Level 26

The long data set as suggested by @Reeza will do that, and should be much easier to program.

--
Paige Miller
Kurt_Bremser
Super User

@alagiejatta wrote:
So the final product is to show a sequence of event. I will submit a file the shows which event occur first, second, third etc by ID. For A11 the final answer is VR-GH-YZ

So you actually do not need the wide layout, you can use the long dataset after sorting.

Reeza
Super User

Convert it to a long structure as I suggested, possibly it started out that way?

 

Either way, look at the options here:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

Kurt_Bremser
Super User

See here the whole code, from converting to the long layout to a report presenting the data in wide form:

data have;
infile datalines dlm="09"x dsd truncover;
input ID $ (vr bd fx gh) (:mmddyy10.);
format vr bd fx gh yymmdd10.;
datalines;
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;

proc transpose
  data=have
  out=long (
   rename=(col1=date)
   where=(date ne .)
  )
;
by ID;
var vr--gh;
run;

proc sort data=long;
by id date;
run;

data want;
set long;
by id;
if first.id
then ct = 1;
else ct + 1;
slot = cats("slot",ct);
drop ct;
run;

proc report data=want;
column id _name_,slot n;
define id / group;
define _name_ / "" display;
define n / noprint;
define slot / "" across;
run;
Ksharp
Super User
data have;
infile cards expandtabs;
input ID	$ ( vr              	bd               	 fx                  	gh) (:mmddyy12.); 
format vr bd fx gh mmddyy10.;
cards;                 
A11	1/2/1999	.	 6/4/2012       	8/1/2021
A12	3/5/2021	.	.	.
A13	3/5/2000	.	.	4/17/1988
A14	10/24/1956	12/2/1967	1/6/2001	.
A15	7/1/1996	11/7/1990   	.	7/21/2001
;


data want;
if _n_=1 then do;
 length date 8 vname $ 40;
 declare hash h(multidata:'y',ordered:'y');
 declare hiter hi('h');
 h.definekey('date');
 h.definedata('vname');
 h.definedone();
end;
 set have;
 array x{*}  vr bd  fx gh;
 array y{*} $ 40 slot1-slot4;
 do i=1 to dim(x);
   date=x{i};vname=vname(x{i});
   if not missing(date) then h.add();
 end;
 do i=1 by 1 while(hi.next()=0);
   y{i}=vname;
 end;
 h.clear();
 drop i date vname;
run;

proc print;run;
FreelanceReinh
Jade | Level 19

Hello @alagiejatta,

 

Is it possible that duplicate dates occur for the same ID as in my sample data below for ID A99?

data have;
input ID $ (vr bd fx gh yz)(:mmddyy.);
format vr--yz mmddyy10.;
cards;
A11 1/2/1999 . 6/4/2012 8/1/2021 9/23/2019
A12 3/5/2021 . . . 3/5/2000
A13 3/5/2000 . . 4/17/1988 .
A14 10/24/1956 12/2/1967 1/6/2001 . .
A15 7/1/1996 11/7/1990 . 7/21/2001 6/10/2000
A99 9/23/2019 3/5/2000 4/17/1988 3/5/2000 1/2/1999
;

If not, you could populate the "slot" variables like this:

data want;
set have;
array _d[*] vr--yz;
array slot[5] $32;
do _n_=1 to n(of _d[*]);
  slot[_n_]=vname(_d[whichn(smallest(_n_, of _d[*]), of _d[*])]);
end;
run;

(Only the name of the first of two or more variables containing the same date would occur in the "slots:" slot3=slot4='bd' for ID A99 in the example above.)

 

That said, I agree with the others who recommend a "long" data structure.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2203 views
  • 0 likes
  • 6 in conversation