Hi all,
I have a diagnosis_combine variable that contains multiple diagnoses separated by " | " . I need to create the variable "want" to scan over the diagnosis_combine to see if all diagnoses are the same or different as below:
Diagnosis_combine | want |
Ehlers-Danlos syndrome | same |
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease | different |
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | same |
vWD, type 2A | Factor I, hereditary | different |
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | same |
Factor XIII, hereditary | Factor VIII, hereditary | same |
vWD, type 1 | Storage Pool Disease |
|
I would highly appreciate if you can help with the best SAS procedures to do it without having to separate diagnosis_combine into multiple columns and compare all columns together.
Best,
Le
Is there a known maximum number of diagnoses in that column?
Very likely a solution will involve creating additional variables and if there is not a known maximum then that will be a problem.
With an array of variables it is not that hard to compare all of the values.
One example:
data have; infile datalines truncover; input Diagnosis_combine $100.; datalines; Ehlers-Danlos syndrome Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary vWD, type 2A | Factor I, hereditary Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary Factor XIII, hereditary | Factor VIII, hereditary vWD, type 1 | Storage Pool Disease ; data want; set have; array d (10) $ 25; do i=1 to countw(Diagnosis_combine,'|'); d[i] = strip(scan(Diagnosis_combine,i,'|')); end; length want $ 9; want='same'; do i=1 to (countw(Diagnosis_combine,'|')-1); do j=(i+1) to (countw(Diagnosis_combine,'|')); if d[i] ne d[j] then do; want='different'; leave; end; end; end; /* after verifying things are working then uncomment the following line*/ /* drop i j d1-d10;*/ run;
The data step is to have something to work with.
The array size should be the largest known (or expected) number of elements. Replace 10 with that number or "guess", you know your data better than we do.
The first do loop pulls each value into a separate variable. The strip is make strings more consistent. It is likely that your actual data may not have a leading space for the value in the first position but does, if your paste is accurate, for those in other positions which would make them "different".
The nested Do i and Do j compares the first value with the "remaining". In the case of only one value the first limit means no comparison is actually attempted.
The LEAVE instruction says to leave the loops when the first not equal value is found.
Drop the variables you don't need when done.
Is there a known maximum number of diagnoses in that column?
Very likely a solution will involve creating additional variables and if there is not a known maximum then that will be a problem.
With an array of variables it is not that hard to compare all of the values.
One example:
data have; infile datalines truncover; input Diagnosis_combine $100.; datalines; Ehlers-Danlos syndrome Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary vWD, type 2A | Factor I, hereditary Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary Factor XIII, hereditary | Factor VIII, hereditary vWD, type 1 | Storage Pool Disease ; data want; set have; array d (10) $ 25; do i=1 to countw(Diagnosis_combine,'|'); d[i] = strip(scan(Diagnosis_combine,i,'|')); end; length want $ 9; want='same'; do i=1 to (countw(Diagnosis_combine,'|')-1); do j=(i+1) to (countw(Diagnosis_combine,'|')); if d[i] ne d[j] then do; want='different'; leave; end; end; end; /* after verifying things are working then uncomment the following line*/ /* drop i j d1-d10;*/ run;
The data step is to have something to work with.
The array size should be the largest known (or expected) number of elements. Replace 10 with that number or "guess", you know your data better than we do.
The first do loop pulls each value into a separate variable. The strip is make strings more consistent. It is likely that your actual data may not have a leading space for the value in the first position but does, if your paste is accurate, for those in other positions which would make them "different".
The nested Do i and Do j compares the first value with the "remaining". In the case of only one value the first limit means no comparison is actually attempted.
The LEAVE instruction says to leave the loops when the first not equal value is found.
Drop the variables you don't need when done.
data have;
infile datalines truncover;
input Diagnosis_combine $100.;
datalines;
Ehlers-Danlos syndrome
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary
vWD, type 2A | Factor I, hereditary
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary
Factor XIII, hereditary | Factor VIII, hereditary
vWD, type 1 | Storage Pool Disease
;
data want;
set have;
want='different';
if missing(compress(tranwrd(Diagnosis_combine,scan(Diagnosis_combine,1,'|'),''),'|')) then want='same';
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.