Solved: Trouble Understanding SAS Question and Process to reach Solution

K_Wils15 · Posted 12-08-2019 01:41 PM

Here is the Problem:

Assume you have a SAS data set (called CLAIMS) that contains pharmacy claims. Each claim should be uniquely identified by the patient's identification number (ID_NUM) and a transaction number (Claim_Num). (E.g., patient 001 should not have more than one record with Claim_Num = 123). You want to be able to inspect any claims for which there are possible duplicate records. Write the SAS code to separate the observations into 2 new SAS data sets. One data set will hold all the observations where the combination of ID_NUM and Claim_Num are unduplicated (that is, there is only one claim with a particular ID_NUM and Claim_Num combination), and a second data set that will hold all the records (including the first) where there are duplicates of Claim_Num for a given ID_NUM. (HINT: you will need to use first. and last. to do this).

Here is what I have:

proc sort data=CLAIMS;
by ID_NUM CLAIM_NUM;
run;

data CLAIMS1 Claims2;
set CLAIMS;
by ID_NUM CLAIM_NUM;
if first.ID_NUM = 1;
Output Claims1;
if first.ID_NUM = 1 and last.ID_NUM = 1;
Output Claims2;
run;

I know this is wrong because I tried a sample data set and there was no output for claims2. I think the error is first.id_num being = to 1 in both cases but I just can't think of how to properly separate the two sets of data. If anyone can help me figure out this problem, it would be greatly appreciated. Thank you.

ed_sas_member · Posted 12-08-2019 01:55 PM

Hi @K_Wils15

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than first/last.ID_NUM.

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;

View solution in original post

ed_sas_member · Posted 12-08-2019 01:55 PM

Hi @K_Wils15

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than first/last.ID_NUM.

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;

SASKiwi · Posted 12-08-2019 01:59 PM

This should be closer to the right answer:

data CLAIMS1 Claims2;
       set CLAIMS;
       by ID_NUM CLAIM_NUM;
       if first.CLAIM_NUM and last.CLAIM_NUM then Output Claims1;
       else Output Claims2;
run;

You need to use CLAIM_NUM for your FIRST. and LAST. processing and by checking both you are outputting unique claims by ID_NUM. All other rows must then be not unique by claim.

Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

Re: Trouble Understanding SAS Question and Process to reach Solution

SAS Innovate 2025: Call for Content