Hello SAS Community,
I am working on a Medicaid dataset. I have the patient ID's and several diagnosis codes for those patients. I have the following data-
data have;
input ID Diagnosis1 $ Diagnosis2 $ Diagnosis3 $ Diagnosis4 $;
datalines;
6816494 6910 6869 147 V061
6816515 463 2893 741 V061
6816515 462 78499 V061 V053
6819880 741 78791 77081 7730
;
run;
I want data that will specify the patients who were diagnosed with some specific codes (i.e., 740, 741..., etc.). So, I am looking for the following data-
data want;
input ID Diagnosis1 $ Diagnosis2 $ Diagnosis3 $ Diagnosis4 $ Patient;
datalines;
6816494 6910 6869 147 V061 0
6816515 463 2893 741 V061 1
6816515 462 78499 V061 V053 0
6819880 741 78791 77081 7730 1
;
run;
Note: the example I gave has 4 diagnoses, but my original dataset has 8 diagnoses.
Thank you so much. Your assistance with the code is appreciated.
You might give this a try:
data want;
set have;
sdiag0=pdiag; /*change your PDIAG variable to be consistent with the SDIAG variables*/
array DiagVar sdiag0-sdiag8;
flag = 0;
do over DiagVar;
if DiagVar in: ('740' '741' '742' '743' '744' '745' '746' '747' '748' '749' '750' '751' '752' '753' '754' '755' '756' '740' '757' '758') then flag = 1;
end;
run;
Like this?
FLAG = prxmatch('/740|741/',catx('|',of DIAGNOSIS1-DIAGNOSIS8));
I'd like to show a useful technique with the IN operator which I will use to address the problem you presented. The solution presented by @ChrisNZ is quite succinct, and I don't intend in any way to argue that my solution is somehow "better;" it's just another approach and a useful technique.
%LET Diagnoses = 4;
%LET Searches = 2;
DATA Want;
SET Have;
ARRAY Diagnoses [&Diagnoses] $5 Diagnosis1 - Diagnosis&Diagnoses;
ARRAY Search_For [&Searches] $5 _temporary_ ('740' '741');
Patient = 0;
DO i = 1 TO &Searches;
IF Search_For[i] IN Diagnoses THEN
Patient = 1;
END;
RUN;
The IN operator here is used to check the entire Diagnoses array for a particular value (here, either 740 or 741).
Jim
My statement becomes
FLAG = prxmatch('/74\d|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG ));
or
FLAG = prxmatch('/74[0-9]|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG ));
with your new requirements.
Also see an alternative syntax for an IN match, by @novinosrin.
Just add a test:
FLAG = prxmatch('/74\d|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG )) > 0;
You might give this a try:
data want;
set have;
sdiag0=pdiag; /*change your PDIAG variable to be consistent with the SDIAG variables*/
array DiagVar sdiag0-sdiag8;
flag = 0;
do over DiagVar;
if DiagVar in: ('740' '741' '742' '743' '744' '745' '746' '747' '748' '749' '750' '751' '752' '753' '754' '755' '756' '740' '757' '758') then flag = 1;
end;
run;
Your modified code looks pretty good, but there are a couple of small things that are going to hold you back. You defined the array as
%LET PDIAG = 1;
ARRAY PDIAG [&PDIAG] $10 SDIAG1 - SDIAG8 $10;
IF Search_For[i] IN PDIAG and SDIAG1-SDIAG8 THEN
You don't want to code "PDIAG and SDIAG1 - SDIAG8". This is what we created the array for. When we created the array Diagnoses, Diagnoses was assigned as "ARRAY Diagnoses [&Diagnoses] $5 PDiag SDiag1 - SDiag8;" So, when you refer to the array name, you're already referring to "PDiag SDiag1 - SDiag8", so you don't need to code it all over again. Just code the name of the array.Try adjusting your program for the five things I've listed above and re-running.
If you get stuck, I've coded everything up below so you have a working solution that you can compare against. In the sample data I created, the odd numbered rows should derive Patient = 0, and the even numbered rows should derive Patient = 1.
I added one new thing to the code: i = &Searches;
Any time we find a condition that causes Patient to be set to 1, there's really no need to check any further. We already know that Patient = 1. So, I set the index to the maximum number, which causes the DO loop to stop further searches on the current record. With only two search elements per record in the original version, there wasn't a lot of savings to be gained by not performing some of the searches, but now with 19 searches per second, you can save some time by avoiding unnecessary searches.
Jim
DATA Have;
INFILE DATALINES4;
INPUT
ID
PDiag $
SDiag1 $
SDiag2 $
SDiag3 $
SDiag4 $
SDiag5 $
SDiag6 $
SDiag7 $
SDiag8 $
;
DATALINES4;
6816494 6910 6869 147 V061 6910 6869 759 V061 147
6816515 463 2893 741 V061 463 2893 1741 V061 2741
6816515 462 78499 V061 V053 462 78499 717 V053 V061
6819880 791 78791 77081 7730 708 78791 77081 7730 744
6819881 721 78791 77081 7730 799 78791 77081 7730 77081
6819882 711 78791 77081 7730 758 78791 77081 7730 77081
6819883 701 78791 77081 7730 058 78791 77081 7730 77081
6819884 703 78791 77081 7730 7581 78791 77081 755 77081
6819885 714 78791 77081 7730 1758 78791 77081 7730 77081
6819886 241 78791 77081 7730 888 78791 77081 7730 740
;;;;
RUN;
%LET Diagnoses = 9;
%LET Searches = 19;
DATA Want;
DROP i;
SET Have;
ARRAY Diagnoses [&Diagnoses] $5 PDiag SDiag1 - SDiag8;
ARRAY Search_For [&Searches] $5 _temporary_ ('740' '741' '742' '743' '744'
'745' '746' '747' '748' '749'
'750' '751' '752' '753' '754'
'755' '756' '757' '758');
Patient = 0;
DO i = 1 TO &Searches;
IF Search_For[i] IN Diagnoses THEN
DO;
Patient = 1;
i = &Searches;
END;
END;
RUN;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.