I've posted what I have and what I want in a table below. Here's the code I'm currently using. Wondering if there is a way to only print the vaccine_x variable on the first line of the id. Then I plan to only keep first.id. I can create new variables if needed. Thanks in advance.
Current code:
data want;
set have;
format vaccine_1 vaccine_2 vaccine_3 vaccine_4 vaccine_5 mmddyy10.;
if vaccine_sequence=1 then vaccine_1 = vaccine_admin_date;
else vaccine_1=.;
if vaccine_sequence=2 then vaccine_2 = vaccine_admin_date;
else vaccine_2=.;
if vaccine_sequence=3 then vaccine_3 = vaccine_admin_date;
else vaccine_3=.;
if vaccine_sequence=4 then vaccine_4 = vaccine_admin_date;
else vaccine_4=.;
if vaccine_sequence=5 then vaccine_5 = vaccine_admin_date;
else vaccine_5=.; proc print;
HAVE | ||||||
id | vaccine_admin_date | vaccine_sequence | vaccine_1 | vaccine_2 | vaccine_3 | vaccine_4 |
1 | 7/1/2024 | 1 | 7/1/2024 | |||
1 | 8/1/2024 | 2 | 8/1/2024 | |||
1 | 9/1/2024 | 3 | 9/1/2024 | |||
1 | 10/1/2024 | 4 | 10/1/2024 | |||
2 | 1/2/2024 | 1 | 1/2/2024 | |||
2 | 2/2/2024 | 2 | 2/2/2024 | |||
2 | 3/2/2024 | 3 | 3/2/2024 | |||
2 | 4/2/2024 | 4 | 4/2/2024 | |||
WANT | ||||||
id | vaccine_admin_date | vaccine_sequence | vaccine_1 | vaccine_2 | vaccine_3 | vaccine_4 |
1 | 7/1/2024 | 1 | 7/1/2024 | 8/1/2024 | 9/1/2024 | 10/1/2024 |
1 | 8/1/2024 | 2 | ||||
1 | 9/1/2024 | 3 | ||||
1 | 10/1/2024 | 4 | ||||
2 | 1/2/2024 | 1 | 1/2/2024 | 2/2/2024 | 3/2/2024 | 4/2/2024 |
2 | 2/2/2024 | 2 | ||||
2 | 3/2/2024 | 3 | ||||
2 | 4/2/2024 | 4 |
proc sql noprint;
select max(nvax) into :maxvax trimmed from (select id, count(*) as nvax from HAVE group by id);
quit;
proc sort data=want; by id vax_admin_date; run;
data want;
set have;
by id;
length vax1-vax&maxvax 4 n 3;
format vax1-vax&maxvax date9.;
array V {*} vax1-vax&maxvax;
retain V n;
if first.id then do;
call missing(of V[*]);
n=0;
end;
n+1;
V[n]=vaccine_admin_date;
if last.id then output;
keep id vax1-vax&maxvax;
run;
It sounds like you HAVE a lovely vertical dataset. That is what I would normally like to have:
data have ;
input id vaccine_admin_date mmddyy10. vaccine_sequence ;
format vaccine_admin_date mmddyy10. ;
cards ;
1 7/1/2024 1
1 8/1/2024 2
1 9/1/2024 3
1 10/1/2024 4
2 1/2/2024 1
2 2/2/2024 2
2 3/2/2024 3
2 4/2/2024 4
;
If you want to transpose it into a wide dataset with one record per ID, you could use PROC TRANPOSE, e.g.:
proc transpose data=have out=want prefix=Vaccine_;
var vaccine_admin_date ;
by id ;
run ;
proc print ;
run ;
which will output:
Obs id _NAME_ Vaccine_1 Vaccine_2 Vaccine_3 Vaccine_4 1 1 vaccine_admin_date 07/01/2024 08/01/2024 09/01/2024 10/01/2024 2 2 vaccine_admin_date 01/02/2024 02/02/2024 03/02/2024 04/02/2024
Usually it's easiest to work with the vertical dataset. Sometimes it's helpful to transpose into the wide dataset. But he WANT dataset you showed, with extra rows with blank records for Vaccine_1-Vaccine_4, does not look very useful to me. Are you sure you really want that?
As a data structure (a table) your Want is sub-optimal and will be hard to work with. It's most of the time much better to organise the data in a long and narrow structure.
Such data structures can then get used with report procs like Proc Report:
And here the code for above:
data have;
input id $ vaccine_admin_date :ddmmyy10. vaccine_sequence;
format vaccine_admin_date ddmmyy10.;
datalines;
1 07/01/2024 1
1 08/01/2024 2
1 09/01/2024 3
1 10/01/2024 4
2 01/02/2024 1
2 02/02/2024 2
2 03/02/2024 3
2 04/02/2024 4
;
run;
proc report data=have;
columns id vaccine_sequence, vaccine_admin_date;
define id / group;
define vaccine_sequence / group across;
run;
The values in your vaccine_1 - vaccine_5 variables all have the exact same values are are in the vaccine_admin_date variable.
Is that always the case? If not what would it mean when the two dates did no agree?
How did they get that way?
Why do you want them that way?
Seem like a silly thing to do. But here goes.
First let's re-create your strange dataset. (I suspect something like this was the cause of the strange layout).
data have;
input id vaccine_admin_date :mmddyy. vaccine_sequence;
array vaccine_ [4];
format vaccine_1-vaccine_4 vaccine_admin_date yymmdd10.;
vaccine_[vaccine_sequence]=vaccine_admin_date;
cards;
1 7/1/2024 1
1 8/1/2024 2
1 9/1/2024 3
1 10/1/2024 4
2 1/2/2024 1
2 2/2/2024 2
2 3/2/2024 3
2 4/2/2024 4
;
Now we can use the UPDATE statement to collapse the VACCINE_1 to VACCINE_4 variables into one observation.
data wide;
update have(obs=0 keep=id vaccine_1-vaccine_4)
have(keep=id vaccine_1-vaccine_4)
;
by id;
run;
And then merge it back with the original dataset.
data want;
merge have wide;
by id;
if not first.id then call missing(of vaccine_1-vaccine_4);
run;
Here's a variation on Tom's program:
data wide;
update have(obs=0 keep=id)
have(keep=id vaccine_1-vaccine_4 vaccine_sequence)
;
by id;
run;
It gives you a wide form to the data, with all the useful information. Its not the form that you asked for, but it does contain everything you might need.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.