Hi folks,
I got some variables on say for example diagnosis which can be coded as 45, 451, 459, 458. So all these observations have 45 in common. I want to capture all observations which has values starting with 45 without writing an entire list of codes.
How do I do that please?
Thanks
data have;
x=45; output;
x=451; output;
x=459; output;
x=458; output;
x=75; output;
x=951; output;
x=729; output;
x=458; output;
run;
data want;
set have;
if substr(left(x),1,2)='45';
run;
Frankly, I have an aversion to implicit data type conversions, so I'd rather do:
data have ;
do x = 45, 451, 459, 458, 75, 951, 729, 458 ;
output ;
end ;
run ;
data want ;
set have ;
if put (x, D.-L) =: "45" ;
run ;
Or, without any type conversions, explicit or implicit:
data want ;
set have ;
if int (x / 10 ** (int (log10 (x)) - 1)) = 45 ;
run ;
That having been said, codes of this nature are always of the character type (usually $5), as they may contain alphas. Hence, it would be apter to model the sample data set accordingly, which in turn makes the subsetting process much simpler:
data have ;
do _n_ = 45, 451, 459, 458, 75, 951, 729, 458 ;
x = put (_n_, 5.-L) ;
output ;
end ;
run ;
data want ;
set have (where = (x =: "45")) ;
run ;
Kind regards
Paul D.
If you have multiple variables you may want to consider an array and testing each value, such as with @Reeza's approach and if none of the variables match, then delete the record. Ease may depend on number of variables involved:
One way with some very limited character data:
data example; informat d1-d3 $5.; input d1 d2 d3; array d d1-d3; flag=0; do i= 1 to dim(d); flag= (d[i] =: '45'); if flag then leave; end; if flag=0 then delete; drop i flag; datalines; 123 456 789 123 234 642 450 124 444 123 124 459 ; run;
A question that my need consideration:
Do you have values such as '45123', longer than the 3 characters that you showed that show NOT be considered as in the wanted values?
Yes, I am looking for
'251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x'
in a number of variables named as 'diag_01 to diag_06'.
How do I write that in syntax please?
Thanks
Essentially the same way @ballardw has already suggested:
data want (drop = _:) ;
set have ;
array d diag: ;
do over d ;
if d in ('251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x') then do ;
_FOUND = 1 ;
leave ;
end ;
end ;
if _FOUND ; * if you want only the records where you have a match ;
run ;
Kind regards
Paul D.
But this code does involve listing all the values and in this specific case I don't know the exact values, as some clinicians would have coded a diagnosis as '45', another as '45X' and another as '45.0', so I don't want to worry about what comes after '45'. I only want to tell SAS to pick any value starting with 45 regardless of 'X', '.', '-' etc. etc.
How do I write that please?
Thanks
S
Also wanted to ask if SAS would take '05 17R' as '0517R' or do I have to specify space between 05 and 17R.
Thanks
Sorry, but I'm confused now.
You said that you wanted to pick the values that start with 45.
Yet you also gave a sample of codes, none of which starts with 45 but may start with a letter (for instance, J) followed by 45:
'251', 'J441','J450', 'J45', 'J451','J458', 'J459', 'J46', 'J46X', 'R062', 'J44.1', 'J441-', 'J44x'
Which makes me conjecture that you want make your SAS code react to the codes whose first two digits are 45, regardless of whether they are preceded by a non-digit or not. In other words, if my conjecture held water, you'd like SAS to react to the following selection from the above list:
'J450', 'J45', 'J451','J458', 'J459'
Is that correct? If so, your condition should begin with the first digit in any code, rather than at character #1. In other words, given the data sample HAVE below, you'd want to select records #1, 2, and 5:
data have ;
input (diag1-diag3) (:$5.) ;
cards ;
251 J441 450XY
J45 J451 J458
J469 J46 J46X
R062 J44.1 J441
J469 45XX J46X
run ;
data want (drop = _select) ;
set have ;
array dd diag: ;
do over dd ;
if substr (dd, anydigit (dd)) =: "45" then do ;
_select = 1 ;
leave ;
end ;
end ;
if _select ;
run ;
Instead of the expression:
substr (dd, anydigit (dd))
you can also use:
compress (dd, , "kd")
Kind regards
Paul D.
Hi Paul,
Sorry about the confusion. I was just giving an example as got multiple diagnoses that I would want to pick up using this code. Here is the description;
For example
Gastroenteritis is entered as
262, 263, 262248, K52, P783, A00, A02, A03, A04, A05, A06, A07, A08, A09, A084, A080, A083, A059, A090, A099
and asthma is coded as;
251, 251228, J441, J450, J45, J451, J458, J459, J46, J46X, R062
And I would, say for example, want to pick any codes that have 26, A0 for gastroenteritis and J44, J45, 251 for asthma.
How would I write it please?
Thanks
S
And also, how do we write this array if we wanted to drop observations with diagnoses codes (given in variables diag_01 to diag_09) including values like J21, J0, J38, J37 regardless of values/digits/characters before and after them.
Thanks
S
You'd need the corresponding number of FIND expressions conjugated by ORs. But in such a case, it's simpler, easier, and less verbose to use the PRXMATCH function - you can judge by comparing the line where it's used with the commented out line with the FINDs below (both lines yield the same result):
data have ; input (diag1-diag3) (:$6.) ; cards ; 251 J441 450XY 1J21 J451 J458 J469 J46 J46X R062 J0XX J441 J455 45XX J46X 262 263 62J38 K52 P783 A00 A02 A0J373 A04 run ; data want (drop = _:) ; set have ; array dd diag: ; do over dd ; if prxmatch ("m/J21|J0|J38|J37/oi", dd) then do ; * if find (dd, "J21") or find (dd, "J0") or find (dd, "J38") or find (dd, "J37") then do ; _exclude = 1 ; leave ; end ; end ; if not _exclude ; run ;
Kind regards
Paul D.
Or :
data HAVE ;
input (DIAG1-DIAG3) (:$6.) ;
cards ;
251 J441 450XY
1J21 J451 J458
J469 J46 J46X
R062 J0XX J441
J455 45XX J46X
262 263 62J38
K52 P783 A00
A02 A0J373 A04
run;
data WANT;
set HAVE;
FLAG_GASTRO=prxmatch ("m/\b26|\bA0/oi" , catx(' ',of DIAG:)) > 0;
FLAG_ASTHMA=prxmatch ("m/\bJ44|\bJ45|\b251/oi", catx(' ',of DIAG:)) > 0;
run;
DIAG1 | DIAG2 | DIAG3 | FLAG_GASTRO | FLAG_ASTHMA |
---|---|---|---|---|
251 | J441 | 450XY | 0 | 1 |
1J21 | J451 | J458 | 0 | 1 |
J469 | J46 | J46X | 0 | 0 |
R062 | J0XX | J441 | 0 | 1 |
J455 | 45XX | J46X | 0 | 1 |
262 | 263 | 62J38 | 1 | 0 |
K52 | P783 | A00 | 1 | 0 |
A02 | A0J373 | A04 | 1 | 0 |
\b means the word begins there
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.