Hi,
I have a large dataset that is kind of poorly set up. It looks kind of like this:
Pain_location | Start_date | Patient |
Head | 2022-03-05 | Maddie |
Abdomen | 2022-04-20 | Jeff, Chris, Ali, Zara |
Head | 2022-04-22 | John, Peter |
Back | 2022-06-01 | Elizabeth |
Back | 2022-06-15 | Frank, Jane, Alice |
Hip | 2022-06-15 | Betty |
I want to change it to a more vertical form based on the last column to look like below. I have looked around but cannot figure it out. Can you please help me out?Thanks,
Pain_location | Start_date | Patient |
Head | 2022-03-05 | Maddie |
Abdomen | 2022-04-20 | Jeff |
Abdomen | 2022-04-20 | Chris |
Abdomen | 2022-04-20 | Ali |
Abdomen | 2022-04-20 | Zara |
Head | 2022-04-22 | John |
Head | 2022-04-22 | Peter |
Back | 2022-06-01 | Elizabeth |
Back | 2022-06-15 | Frank |
Back | 2022-06-15 | Jane |
Back | 2022-06-15 | Alice |
Hip | 2022-06-15 | Betty |
Here is one way creating a new variable:
data want; set have; length newpatient $ 10; /*large enough number to hold the longest name*/ do i=1 to countw(patient,','); newpatient=strip(scan(patient,i,',')); output; end; drop i; /* may want to drop Patient as well*/ run;
The logic will require a new variable. If you really want to have the name of the variable as Patient then there are some options about renaming the existing variable and using that as "old" name with Patient instead of Newpatient.
Here is one way creating a new variable:
data want; set have; length newpatient $ 10; /*large enough number to hold the longest name*/ do i=1 to countw(patient,','); newpatient=strip(scan(patient,i,',')); output; end; drop i; /* may want to drop Patient as well*/ run;
The logic will require a new variable. If you really want to have the name of the variable as Patient then there are some options about renaming the existing variable and using that as "old" name with Patient instead of Newpatient.
Very awesome. Thank you 🙂
Hi @Primavera
You already have a solution form @ballardw but let me share one more, just for fun:
data have;
infile CARDS dlm="|";
input Pain_location :$12. Start_date yymmdd10. Patient :$200.;
format Start_date yymmdd10.;
cards;
Head|2022-03-05|Maddie
Abdomen|2022-04-20|Jeff,Chris,Ali,Zara
Head|2022-04-22|John,Peter
Back|2022-06-01|Elizabeth
Back|2022-06-15|Frank,Jane,Alice
Hip|2022-06-15|Betty
;
run;
proc print;
run;
filename f TEMP;
data _null_; file f; put; run;
data want;
set have;
infile f dlm="," truncover; /* fake file required to data split*/
/* split Patinet into varaibles n1 to n5*/
input @@;
_infile_=patient;
input @1 (n1-n5) (: $ 32.) @@;
/* loop over n1-n5 to output values */
array n $ n1-n5;
do over n;
if n ne " " then
do;
newPatient=n;
output;
end;
else leave;
end;
/* clean up */
drop n1-n5 patient;
rename newPatient=patient;
run;
proc print;
run;
filename f clear;
Bart
And one more with proc transpose (assuming data set is sorted by Start_date and Pain_location):
filename f TEMP;
data _null_; file f; put; run;
data want2;
set have;
infile f dlm="," truncover; /* fake file required to data split*/
/* split Patinet into varaibles n1 to n5*/
input @@;
_infile_=patient;
input @1 (n1-n5) (: $ 32.) @@;
output;
drop patient;
run;
proc transpose
data=want2
out=want2(drop=_:
rename=(col1=patient)
where=(patient is not null)
);
by Start_date Pain_location;
var n:;
run;
proc print;
run;
filename f clear;
Bart
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.