BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ftahsin
Obsidian | Level 7

Hello SAS Community,

 

I am working on a Medicaid dataset. I have the patient ID's and several diagnosis codes for those patients. I have the following data- 

data have;
input     ID    Diagnosis1 $  Diagnosis2 $  Diagnosis3 $  Diagnosis4 $;
datalines;
6816494         6910             6869          147           V061
6816515         463              2893          741           V061
6816515         462              78499         V061          V053
6819880         741              78791         77081         7730
;
run;

I want data that will specify the patients who were diagnosed with some specific codes (i.e., 740, 741..., etc.). So, I am looking for the following data- 

data want;
input     ID    Diagnosis1 $  Diagnosis2  $  Diagnosis3 $  Diagnosis4 $  Patient;
datalines;
6816494         6910             6869          147           V061            0
6816515         463              2893          741           V061            1
6816515         462              78499         V061          V053            0
6819880         741              78791         77081         7730            1
;
run;

Note: the example I gave has 4 diagnoses, but my original dataset has 8 diagnoses.

 

Thank you so much. Your assistance with the code is appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
eabc0351
Quartz | Level 8

You might give this a try:

data want; 
set have;
sdiag0=pdiag; /*change your PDIAG variable to be consistent with the SDIAG variables*/
array DiagVar sdiag0-sdiag8;
flag = 0;
do over DiagVar;
if DiagVar in: ('740' '741' '742' '743' '744' '745' '746' '747' '748' '749' '750' '751' '752' '753' '754' '755' '756' '740' '757' '758') then flag = 1;
end;
run;

View solution in original post

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

Like this?

FLAG = prxmatch('/740|741/',catx('|',of DIAGNOSIS1-DIAGNOSIS8));

jimbarbour
Meteorite | Level 14

@ftahsin,

 

I'd like to show a useful technique with the IN operator which I will use to address the problem you presented.  The solution presented by @ChrisNZ is quite succinct, and I don't intend in any way to argue that my solution is somehow "better;" it's just another approach and a useful technique.

%LET	Diagnoses	=	4;
%LET	Searches	=	2;

DATA	Want;
	SET	Have;
	ARRAY	Diagnoses	[&Diagnoses]	$5	Diagnosis1 - Diagnosis&Diagnoses;
	ARRAY	Search_For	[&Searches]		$5	_temporary_	('740' '741');

Patient = 0; DO i = 1 TO &Searches; IF Search_For[i] IN Diagnoses THEN Patient = 1; END; RUN;

The IN operator here is used to check the entire Diagnoses array for a particular value (here, either 740 or 741).

 

Jim

ftahsin
Obsidian | Level 7
Hello JIM,
Thanks a lot for your response. I tried to run your code by modifying it a bit according to my variables. I actually have 9 diagnoses named PDIAG and SDIAG1 - SDIAG8 (PDIAG = primary diagnosis, SDIAG = secondary diagnosis). And the number of my searches will be 19 (740 - 758). So, I tried the following code:

%LET PDIAG = 1;
%LET SDIAG = 8;
%LET Searches = 19;

DATA want;
SET have;
ARRAY PDIAG [&PDIAG] $10 SDIAG1 - SDIAG8 $10;
ARRAY Search_For [&Searches] $10 _temporary_ ('740' '741' '742' '743' '744' '745' '746' '747' '748' '749' '750' '751' '752' '753' '754' '755' '756' '740' '757' '758');

DO i = 1 TO &Searches;
IF Search_For[i] IN PDIAG and SDIAG1-SDIAG8 THEN
Patient = 1;
ELSE
Patient = 0;
END;
RUN;

Unfortunately, it didn't work. Can you please tell me what did I do wrong? It will be a great help!
ChrisNZ
Tourmaline | Level 20

My statement becomes 

FLAG = prxmatch('/74\d|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG ));

or

FLAG = prxmatch('/74[0-9]|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG ));

with your new requirements.

 

Also see an alternative syntax for an IN match, by @novinosrin.

 

ftahsin
Obsidian | Level 7
Hello ChrisNZ,
Thanks a lot for your reply. I tried your code and it runs. But I got values like this-

Flag
0
0
18
18
7
40
0
0
1
Where it should be only 0 or 1 based on the presence of any of the diagnosis codes. Can you please tell me how I can fix this?

Thanks again,
Farah
ChrisNZ
Tourmaline | Level 20

Just add a test:

FLAG = prxmatch('/74\d|75[0-8]/',catx('|',of SDIAG1-SDIAG8, PDIAG )) > 0;

eabc0351
Quartz | Level 8

You might give this a try:

data want; 
set have;
sdiag0=pdiag; /*change your PDIAG variable to be consistent with the SDIAG variables*/
array DiagVar sdiag0-sdiag8;
flag = 0;
do over DiagVar;
if DiagVar in: ('740' '741' '742' '743' '744' '745' '746' '747' '748' '749' '750' '751' '752' '753' '754' '755' '756' '740' '757' '758') then flag = 1;
end;
run;
ftahsin
Obsidian | Level 7
Hello,
Thank a lot for your help! It worked!
jimbarbour
Meteorite | Level 14

@ftahsin

 

Your modified code looks pretty good, but there are a couple of small things that are going to hold you back.  You defined the array as

%LET PDIAG = 1;
ARRAY PDIAG [&PDIAG] $10 SDIAG1 - SDIAG8 $10;
  1. You can't use the same name for your array as for the variable itself.  In other words, you can't have both a PDIAG array and a PDIAG variable.  Instead of PDIAG for the diagnoses array name, let's use the name Diagnoses.
  2. If you define your array as Diagnoses[&PDIAG], you're only going to get 1 occurrence in the array (the macro variable PDIAG resolves to 1).  Rather than 1, you have 9 diagnoses (one primary and eight secondary), so you need to define your array as Diagnoses[9].
  3. If you count, you actually have twenty, not nineteen occurrences in your Search_For array.  The value '740' is repeated twice.  You need to remove the second '740' or you'll lose the last occurrence in the array.
  4. In your IF statement, you coded:
    IF Search_For[i] IN PDIAG and SDIAG1-SDIAG8 THEN
    You don't want to code "PDIAG and SDIAG1 - SDIAG8".  This is what we created the array for.  When we created the array Diagnoses, Diagnoses was assigned as "ARRAY Diagnoses [&Diagnoses] $5 PDiag SDiag1 - SDiag8;"  So, when you refer to the array name, you're already referring to "PDiag SDiag1 - SDiag8", so you don't need to code it all over again.  Just code the name of the array.
  5. When you set your Patient value to 0, the best way to do that is before you go through the Search_For loop.  If you set the patient value to 0 as part of an ELSE, you might overlay a value of 1 determined in an earlier iteration. I originally coded it that way and caught the bug later and corrected it.  You grabbed my code before I corrected the bug, so this one's not your fault.  I have now corrected the bug, and you should too.

Try adjusting your program for the five things I've listed above and re-running.

 

If you get stuck, I've coded everything up below so you have a working solution that you can compare against.  In the sample data I created, the odd numbered rows should derive Patient = 0, and the even numbered rows should derive Patient = 1.  

 

I added one new thing to the code:  i = &Searches;  Any time we find a condition that causes Patient to be set to 1, there's really no need to check any further.  We already know that Patient = 1.  So, I set the index to the maximum number, which causes the DO loop to stop further searches on the current record.  With only two search elements per record in the original version, there wasn't a lot of savings to be gained by not performing some of the searches, but now with 19 searches per second, you can save some time by avoiding unnecessary searches.

 

Jim

 

DATA	Have;
	INFILE	DATALINES4;
	INPUT
		ID
		PDiag  $
		SDiag1 $
		SDiag2 $
		SDiag3 $
		SDiag4 $
		SDiag5 $
		SDiag6 $
		SDiag7 $
		SDiag8 $
		;
DATALINES4;
6816494 6910 6869  147   V061 6910 6869  759   V061 147
6816515 463  2893  741   V061 463  2893  1741  V061 2741
6816515 462  78499 V061  V053 462  78499 717   V053 V061
6819880 791  78791 77081 7730 708  78791 77081 7730 744
6819881 721  78791 77081 7730 799  78791 77081 7730 77081
6819882 711  78791 77081 7730 758  78791 77081 7730 77081
6819883 701  78791 77081 7730 058  78791 77081 7730 77081
6819884 703  78791 77081 7730 7581 78791 77081 755  77081
6819885 714  78791 77081 7730 1758 78791 77081 7730 77081
6819886 241  78791 77081 7730 888  78791 77081 7730 740
;;;;
RUN;

%LET	Diagnoses	=	9;
%LET	Searches	=	19;

DATA	Want;
	DROP	i;
	SET	Have;
	ARRAY	Diagnoses	[&Diagnoses]	$5	PDiag SDiag1 - SDiag8;
	ARRAY	Search_For	[&Searches]		$5	_temporary_	('740' '741' '742' '743' '744' 
														 '745' '746' '747' '748' '749' 
														 '750' '751' '752' '753' '754' 
														 '755' '756' '757' '758');

	Patient				=	0;

	DO	i			=	1	TO	&Searches;
		IF	Search_For[i]	IN	Diagnoses	THEN
			DO;
				Patient	=	1;
				i		=	&Searches;
			END;
	END;
RUN;

 

ftahsin
Obsidian | Level 7
Hello Jim,

Thanks a lot for explaining these to me. I am a beginner in SAS, so the points you made are very informative to me. I will keep them in mind whenever I work with these types of codes.

I tried your code and it ran. For some reason, I was getting 0 for all my patients, although manually I can see a considerable number of people have those specific diagnoses codes. But I fixed it, so no issues! Thanks again for your help.

-Farah

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2213 views
  • 10 likes
  • 4 in conversation