I will like to search for any of these alphanumeric codes ( J45.10, J45.20,J45.30) in a column titled (Diagnoses) which has text including different alphanumeric codes, and then assign their existence to categorical variable (AI). But my codes return errors.
For example, a cell in column "diagnoses" have text such as "J45.10 throat ache, uncomplicated ; S81.011A Laceration without foreign body, right knee, initial encounter ; S50.311A Abrasion of right elbow, initial encounter "
Here is my code:
Data want;
if diagnoses in ('J45.10', 'J45.20', 'J45.30') then AI=1;
else AI=0;
run;
Thanks for your help.
Unfortunately using the IN operator here won't work since SAS interprets this statement:
if diagnoses in ('J45.10', 'J45.20', 'J45.30') then AI=1;
as:
if diagnoses ='J45.10' or diagnoses ='J45.20' or diagnoses ='J45.30' then AI=1;
Based on the example you provided for the values of Diagnoses, none of these conditions would ever result to True since they are much longer strings.
What you could use instead is the FIND function which would allow you to search for a substring of characters within the variable Diagnoses like so:
if find(diagnoses,"J45.10",'i')>0 then AI=1;
Since the FIND fuction will return the starting position of the substring within the string, you're looking for a value >0 to confirm whether it is present.
@antonbcristina your suggestion returned missing for all observations. The code did not work.
J45 codes are in a range J45.10 - J45.90 and I would like to find them in each cell within column "diagnoses" and output them as AI=1.
Thanks.
You can use the IN operator with the colon modifier to search for a list of substrings like J45.10 in your string stored in variable Diagnoses. You will need to list all the substrings explicitly.
Alternatively use the findw() function and search for J45. This of course will only work if there aren't any cases where you have J45 in the string but don't want to return true - let's say you wouldn't want true (AI=1) for J45.991
If below two coding alternatives don't return what you need then please provide sample data that include cases where things aren't working yet (=amend below data have step), explain the required logic and provide the desired outcome when using the sample data.
data have;
infile datalines truncover;
input diagnoses $200.;
datalines4;
J45.10 throat ache, uncomplicated ; S81.011A Laceration without foreign body, right knee, initial encounter ; S50.311A Abrasion of right elbow, initial encounter
X45.10 throat ache, uncomplicated ; S81.011A Laceration without foreign body, right knee, initial encounter ; S50.311A Abrasion of right elbow, initial encounter
;;;;
data want;
set have;
if diagnoses in: ('J45.10', 'J45.20', 'J45.30') then AI=1;
else AI=0;
run;
proc print data=want(keep=ai);
run;
data want2;
set have;
A1= findw(diagnoses,'J45');
run;
proc print data=want2;
run;
This found the variables but ascribed zero to all of them
data want;
set have;
if diagnoses in: ('J45.10', 'J45.20', 'J45.30') then AI=1;
else AI=0;
run;
I switched it with AI =0 if any of the variables are found (see below) and it correctly ascribed 1 to all of them. It's weird! I want to ascribe AI=1 in any cell it finds 'J45.10', 'J45.20', 'J45.30' and AI=0 otherwise
I have over 14,000 observations with a column 'diagnoses' which has text information beyond these alphanumeric variables: 'J45.10', 'J45.20', 'J45.30' and I wish to find where this exist in each diagnoses cell. Thanks
data want;
set have;
if diagnoses in: ('J45.10', 'J45.20', 'J45.30') then AI=0;
else AI=1;
run;
@femiajumobi1 wrote:
This found the variables but ascribed zero to all of them
data want; set have; if diagnoses in: ('J45.10', 'J45.20', 'J45.30') then AI=1; else AI=0; run;
I switched it with AI =0 if any of the variables are found (see below) and it correctly ascribed 1 to all of them. It's weird! I want to ascribe AI=1 in any cell it finds 'J45.10', 'J45.20', 'J45.30' and AI=0 otherwise
I can't replicate what you describe. Please provide representative sample data!
Just extend below HAVE table with additional sample data where things aren't working as expected and show us the desired result.
Below code return AI=1 for strings where one of the substring in the IN clause are found.
data have;
infile datalines truncover;
input diagnoses $200.;
datalines4;
J45.10 throat ache, uncomplicated ; S81.011A Laceration without foreign body, right knee, initial encounter ; S50.311A Abrasion of right elbow, initial encounter
X45.10 throat ache, uncomplicated ; S81.011A Laceration without foreign body, right knee, initial encounter ; S50.311A Abrasion of right elbow, initial encounter
;;;;
data want;
set have;
if diagnoses in: ('J45.10', 'J45.20', 'J45.30') then AI=1;
else AI=0;
run;
proc print data=want(keep=ai);
run;
I have over 14,000 observations with a column 'diagnoses' which has text information beyond these alphanumeric variables: 'J45.10', 'J45.20', 'J45.30' and I wish to find
If the number of ICD codes you want to search for becomes too big then some approach using a lookup table might be better suited. But we can't know if you don't tell us.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.