I am trying to separate out a column as it currently has multiple effects listed in the individual column like so:
I use the scan function and create different column names for the various symptoms.
From there, I would like to arrange them alphabetically but have the missing values be last (i.e. have the Symptom1 column be acidosis for all those who have that symptom) so that I can use the proc freq function to determine how many people total had each individual symptom.
I tried using call sortc but this had all the missing values first (1 of the clinical effects had 23 different symptoms, but many only had 1 or 2 listed) and then would start the symptoms in alphabetical order so that the "acidosis" was in different numbers instead of always being symptom1 like I would like it to be.
Any suggestions?
Try to first separate distinct values into a single column, then sort it and transpose.
eg;
data have;
infile cards truncover;
input @1 id @3 symptoms $ 3-100;
cards;
1 Dizziness/vertigo (NR),Vomitting (NR),Confusion (R)
2 Headache (R),Muscle weakness (R),Acidosis
3 Confusion (R)
4 Vomitting (NR),Headache (R)
;
proc print;run;
data have1;
set have;
num= countc(symptoms, ',');
do i= 1 to num+1;
Symptom= strip(scan(symptoms, i, ','));
output;
end;
drop num i;
run;
proc print; run;
proc sort data= have1;
by id symptom;
proc print; run;
proc transpose data=have1 out=want (drop=_name_) prefix=Symptom;
by id;
var symptom;
proc print;run;
You have a character variable containing a comma-separated list. You want to extract and alphabetize the list elements. You can do this in a single data step.
Say no list is expected to have more than 25 members, then you can manage with a 25-element arrays of character variables:
data have;
infile cards truncover;
input @1 id @3 symptoms $ 3-100;
cards;
1 Dizziness/vertigo (NR),Vomitting (NR),Confusion (R)
2 Headache (R),Muscle weakness (R),Acidosis
3 Confusion (R)
4 Vomitting (NR),Headache (R)
run;
data want (drop=_s _t);
set have;
array symptom {25} $30 ;
do _s=1 to dim(symptom) until (_s=countw(symptoms,','));
symptom{_s}=strip(scan(symptoms,_s, ','));
end;
call sortc(of symptom{*});
length _alphabetized_symptoms $800; *For temporary sorted CSV list of symptoms *;
_alphabetized_symptoms=catx(',',of symptom{*});
call missing(of symptom{*});
do _t=1 to _s;
symptom{_t}=scan(_alphabetized_symptoms,_t,',');
end;
run;
A possible alternative to above, which shifts the sorted values to the left, is to add elements that will always sort higher than actual data, namely assign unneeded elements with hexadecimal 'FF' values. After sorting reset them to blank.
data want (drop=_s _t);
set have;
array symptom {25} $30 ;
do _s=1 to dim(symptom) until (_s=countw(symptoms,','));
symptom{_s}=strip(scan(symptoms,_s, ','));
end;
if _s<dim(symptom) then do _t=_s+1 to dim(symptom);
symptom{_t}='FF'x;
end;
call sortc(of symptom{*});
if _s<dim(symptom) then do _t=_s+1 to dim(symptom);
symptom{_t}=' ';
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.