BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
binhle50
Obsidian | Level 7

Hi all,

I have a diagnosis_combine variable that contains multiple diagnoses separated by " | " . I need to create the variable "want" to scan over the diagnosis_combine to see if all diagnoses are the same or different as below:

Diagnosis_combine want
Ehlers-Danlos syndrome same
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease different
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary same
vWD, type 2A | Factor I, hereditary different
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary same
Factor XIII, hereditary | Factor VIII, hereditary same
vWD, type 1 |  Storage Pool Disease

 

I would highly appreciate if you can help with the best SAS procedures to do it without having to separate diagnosis_combine into multiple columns and compare all columns together.

 

Best,

 

Le

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Is there a known maximum number of diagnoses in that column?

 

Very likely a solution will involve creating additional variables and if there is not a known maximum then that will be a problem.

With an array of variables it is not that hard to compare all of the values.

 

One example:

data have;
   infile datalines truncover;
   input Diagnosis_combine $100.;
datalines;
Ehlers-Danlos syndrome 	
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease 	
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary 	
vWD, type 2A | Factor I, hereditary 	
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary 	
Factor XIII, hereditary | Factor VIII, hereditary 	
vWD, type 1 |  Storage Pool Disease 	
;

data want;
   set have;
   array d (10) $ 25;
   do i=1 to countw(Diagnosis_combine,'|');
      d[i] = strip(scan(Diagnosis_combine,i,'|'));
   end;
   length want $ 9;
   want='same';
   do i=1 to (countw(Diagnosis_combine,'|')-1);
      do j=(i+1) to (countw(Diagnosis_combine,'|'));
         if d[i] ne d[j] then do;
             want='different';
             leave;
         end;
      end;
   end;
   /* after verifying things are working then uncomment 
   the following line*/
/*   drop i j d1-d10;*/
run;

The data step is to have something to work with.

The array size should be the largest known (or expected) number of elements. Replace 10 with that number or "guess", you know your data better than we do.

The first do loop pulls each value into a separate variable. The strip is make strings more consistent. It is likely that your actual data may not have a leading space for the value in the first position but does, if your paste is accurate, for those in other positions which would make them "different".

The nested Do i and Do j compares the first value with the "remaining". In the case of only one value the first limit means no comparison is actually attempted.

The LEAVE instruction says to leave the loops when the first not equal value is found.

Drop the variables you don't need when done.

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Is there a known maximum number of diagnoses in that column?

 

Very likely a solution will involve creating additional variables and if there is not a known maximum then that will be a problem.

With an array of variables it is not that hard to compare all of the values.

 

One example:

data have;
   infile datalines truncover;
   input Diagnosis_combine $100.;
datalines;
Ehlers-Danlos syndrome 	
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease 	
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary 	
vWD, type 2A | Factor I, hereditary 	
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary 	
Factor XIII, hereditary | Factor VIII, hereditary 	
vWD, type 1 |  Storage Pool Disease 	
;

data want;
   set have;
   array d (10) $ 25;
   do i=1 to countw(Diagnosis_combine,'|');
      d[i] = strip(scan(Diagnosis_combine,i,'|'));
   end;
   length want $ 9;
   want='same';
   do i=1 to (countw(Diagnosis_combine,'|')-1);
      do j=(i+1) to (countw(Diagnosis_combine,'|'));
         if d[i] ne d[j] then do;
             want='different';
             leave;
         end;
      end;
   end;
   /* after verifying things are working then uncomment 
   the following line*/
/*   drop i j d1-d10;*/
run;

The data step is to have something to work with.

The array size should be the largest known (or expected) number of elements. Replace 10 with that number or "guess", you know your data better than we do.

The first do loop pulls each value into a separate variable. The strip is make strings more consistent. It is likely that your actual data may not have a leading space for the value in the first position but does, if your paste is accurate, for those in other positions which would make them "different".

The nested Do i and Do j compares the first value with the "remaining". In the case of only one value the first limit means no comparison is actually attempted.

The LEAVE instruction says to leave the loops when the first not equal value is found.

Drop the variables you don't need when done.

 

binhle50
Obsidian | Level 7
Thanks so much! That is exactly what I am looking for. I replaced 10 with the largest known number from my data and it works as I expected. Thanks so much for your help!

Best,
Le
Ksharp
Super User
data have;
   infile datalines truncover;
   input Diagnosis_combine $100.;
datalines;
Ehlers-Danlos syndrome  
Storage Pool Disease | Storage Pool Disease | Factor I, hereditary | Storage Pool Disease | Storage Pool Disease  
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary  
vWD, type 2A | Factor I, hereditary  
Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary | Factor I, hereditary  
Factor XIII, hereditary | Factor VIII, hereditary  
vWD, type 1 |  Storage Pool Disease  
;
data want;
 set have;
 want='different';
 if missing(compress(tranwrd(Diagnosis_combine,scan(Diagnosis_combine,1,'|'),''),'|')) then want='same';
run;
binhle50
Obsidian | Level 7
Thanks so much!!!
Yes it works great and it should be the most simple method.
Best,
Le

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 602 views
  • 2 likes
  • 3 in conversation