BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi folks,

 

I got some variables on say for example diagnosis which can be coded as 45, 451, 459, 458. So all these observations have 45 in common. I want to capture all observations which has values starting with 45 without writing an entire list of codes.

 

How do I do that please?

 

Thanks

27 REPLIES 27
Reeza
Super User
Is it numeric or character variables?

If character use a colon operator with the equal sign to check the first two characters only.


if diag_code =: '45';
PeterClemmensen
Tourmaline | Level 20
data have;
x=45;  output;
x=451; output;
x=459; output;
x=458; output;
x=75;  output;
x=951; output;
x=729; output;
x=458; output;
run;

data want;
	set have;
	if substr(left(x),1,2)='45';
run;
hashman
Ammonite | Level 13

@PeterClemmensen :

Frankly, I have an aversion to implicit data type conversions, so I'd rather do:

 

data have ;                                    
  do x = 45, 451, 459, 458, 75, 951, 729, 458 ;
    output ;                                   
  end ;                                        
run ;                                          
                                               
data want ;                                    
  set have ;                                   
  if put (x, D.-L) =: "45" ;                   
run ;                                          

Or, without any type conversions, explicit or implicit:

 

 

data want ;                                       
  set have ;                                      
  if int (x / 10 ** (int (log10 (x)) - 1)) = 45 ;
run ;                                             

That having been said, codes of this nature are always of the character type (usually $5), as they may contain alphas. Hence, it would be apter to model the sample data set accordingly, which in turn makes the subsetting process much simpler: 

data have ;                                      
  do _n_ = 45, 451, 459, 458, 75, 951, 729, 458 ;
    x = put (_n_, 5.-L) ;                        
    output ;                                     
  end ;                                          
run ;                                            
                                                 
data want ;                                      
  set have (where = (x =: "45")) ;                                 
run ;                                            

Kind regards

Paul D.

 

ballardw
Super User

If you have multiple variables you  may want to consider an array and testing each value, such as with @Reeza's approach and if none of the variables match, then delete the record. Ease may depend on number of variables involved:

 

One way with some very limited character data:

data example;
   informat d1-d3 $5.;
   input d1 d2 d3;
   array d d1-d3;
   flag=0;
   do i= 1 to dim(d);
      flag= (d[i] =: '45');
      if flag then leave;
   end;
   if flag=0 then delete;
   drop i flag;

datalines;
123 456 789
123 234 642
450 124 444
123 124 459
;
run;

A question that my need consideration:

    Do you have values such as '45123', longer than the 3 characters that you showed that show NOT be considered as in the wanted  values?

 

sks521
Quartz | Level 8

Yes, I am looking for 

 

'251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x'

 

in a number of variables named as 'diag_01 to diag_06'.

 

How do I write that in syntax please?

 

Thanks

hashman
Ammonite | Level 13

@sks521:

 

Essentially the same way @ballardw has already suggested:

data want (drop = _:) ;
  set have ;
  array d diag: ;
  do over d ;
    if d in ('251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x') then do ;
      _FOUND = 1 ;
      leave ;
end ; end ; if _FOUND ; * if you want only the records where you have a match ; run ;

Kind regards

Paul D.  

sks521
Quartz | Level 8

But this code does involve listing all the values and in this specific case I don't know the exact values, as some clinicians would have coded a diagnosis as '45', another as '45X' and another as '45.0', so I don't want to worry about what comes after '45'. I only want to tell SAS to pick any value starting with 45 regardless of 'X', '.', '-' etc. etc.

 

How do I write that please?

 

Thanks

S

sks521
Quartz | Level 8

Also wanted to ask if SAS would take '05 17R' as '0517R' or do I have to specify space between 05 and 17R.

 

Thanks

hashman
Ammonite | Level 13

@sks521:

Sorry, but I'm confused now. 

You said that you wanted to pick the values that start with 45.

Yet you also gave a sample of codes, none of which starts with 45 but may start with a letter (for instance, J) followed by 45:

 

'251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x'

 

Which makes me conjecture that you want make your SAS code react to the codes whose first two digits are 45, regardless of whether they are preceded by a non-digit or not. In other words, if my conjecture held water, you'd like SAS to react to the following selection from the above list:

 

'J450', 'J45', 'J451','J458', 'J459' 

 

Is that correct? If so, your condition should begin with the first digit in any code, rather than at character #1. In other words, given the data sample HAVE below, you'd want to select records #1, 2, and 5:

 

data have ;                                        
  input (diag1-diag3) (:$5.) ;                     
  cards ;                                          
251  J441  450XY                                   
J45  J451  J458                                    
J469 J46   J46X                                    
R062 J44.1 J441                                    
J469 45XX  J46X                                    
run ;                                              
                                                   
data want (drop = _select) ;                      
  set have ;                                       
  array dd diag: ;                                 
  do over dd ;                                     
    if substr (dd, anydigit (dd)) =: "45" then do ;
      _select = 1 ;                                
      leave ;                                      
    end ;                                          
  end ;                                            
  if _select ;                                     
run ;                                              

Instead of the expression:

substr (dd, anydigit (dd))

you can also use:

compress (dd, , "kd")

Kind regards

Paul D.

 

 

 

 

 

  

sks521
Quartz | Level 8

Hi Paul,

 

Sorry about the confusion. I was just giving an example as got multiple diagnoses that I would want to pick up using this code. Here is the description;

 

For example

 

Gastroenteritis is entered as 

262, 263, 262248, K52, P783, A00, A02, A03, A04, A05, A06, A07, A08, A09, A084, A080, A083, A059, A090, A099

 

and asthma is coded as;

251, 251228, J441, J450, J45, J451, J458, J459, J46, J46X, R062

 

And I would, say for example, want to pick any codes that have 26, A0 for gastroenteritis and J44, J45, 251 for asthma.

 

How would I write it please?

 

Thanks

S

sks521
Quartz | Level 8

And also, how do we write this array if we wanted to drop observations with diagnoses codes (given in variables diag_01 to diag_09) including values like J21, J0, J38, J37 regardless of values/digits/characters before and after them.

 

Thanks

S

hashman
Ammonite | Level 13

@sks521:

You'd need the corresponding number of FIND expressions conjugated by ORs. But in such a case, it's simpler, easier, and less verbose to use the PRXMATCH function - you can judge by comparing the line where it's used with the commented out line with the FINDs below (both lines yield the same result):

data have ;                                                                                 
  input (diag1-diag3) (:$6.) ;                                                              
  cards ;                                                                                   
251  J441   450XY                                                                           
1J21  J451  J458                                                                            
J469 J46    J46X                                                                            
R062 J0XX   J441                                                                            
J455 45XX   J46X                                                                            
262  263    62J38                                                                           
K52  P783   A00                                                                             
A02  A0J373 A04                                                                             
run ;                                                                                       
                                                                                            
data want (drop = _:) ;                                                                     
  set have ;                                                                                
  array dd diag: ;                                                                          
  do over dd ;                                                                              
    if prxmatch ("m/J21|J0|J38|J37/oi", dd) then do ;                                       
  * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ;
      _exclude = 1 ;                                                                        
      leave ;                                                                               
    end ;                                                                                   
  end ;                                                                                     
  if not _exclude ;                                                                         
run ;                                                                                       

Kind regards

Paul D.

 

ChrisNZ
Tourmaline | Level 20

Or :

data HAVE ;                                                                                 
  input (DIAG1-DIAG3) (:$6.) ;                                                              
  cards ;                                                                                   
251  J441   450XY                                                                           
1J21  J451  J458                                                                            
J469 J46    J46X                                                                            
R062 J0XX   J441                                                                            
J455 45XX   J46X                                                                            
262  263    62J38                                                                           
K52  P783   A00                                                                             
A02  A0J373 A04                                                                             
run;                                                                                       
                                                                                            
data WANT;                                                                     
  set HAVE;                                                                                
  FLAG_GASTRO=prxmatch ("m/\b26|\bA0/oi"        , catx(' ',of DIAG:)) > 0;                                       
  FLAG_ASTHMA=prxmatch ("m/\bJ44|\bJ45|\b251/oi", catx(' ',of DIAG:)) > 0;                                       
run;           

DIAG1 DIAG2 DIAG3 FLAG_GASTRO FLAG_ASTHMA
251 J441 450XY 0 1
1J21 J451 J458 0 1
J469 J46 J46X 0 0
R062 J0XX J441 0 1
J455 45XX J46X 0 1
262 263 62J38 1 0
K52 P783 A00 1 0
A02 A0J373 A04 1 0

\b  means the word begins there

hashman
Ammonite | Level 13

@ChrisNZ

Very clever! 

Committed to my storage.

 

Kind regards

Paul D.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 27 replies
  • 2567 views
  • 3 likes
  • 6 in conversation