BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
K_Wils15
Obsidian | Level 7

Here is the Problem: 

  1. Assume you have a SAS data set (called CLAIMS) that contains pharmacy claims. Each claim should be uniquely identified by the patient's identification number (ID_NUM) and a transaction number (Claim_Num). (E.g., patient 001 should not have more than one record with Claim_Num = 123).  You want to be able to inspect any claims for which there are possible duplicate records.  Write the SAS code to separate the observations into 2 new SAS data sets.  One data set will hold all the observations where the combination of ID_NUM and Claim_Num are unduplicated (that is, there is only one claim with a particular ID_NUM and Claim_Num combination), and a second data set that will hold all the records (including the first) where there are duplicates of Claim_Num for a given ID_NUM.  (HINT: you will need to use first. and last. to do this).

Here is what I have:

 

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

 

data CLAIMS1 Claims2;
       set CLAIMS;
       by ID_NUM CLAIM_NUM;
       if first.ID_NUM = 1;
Output Claims1;
       if first.ID_NUM = 1 and last.ID_NUM = 1;
Output Claims2;
run;

 

I know this is wrong because I tried  a sample data set and there was no output for claims2. I think the error is first.id_num being = to 1 in both cases but I just can't think of how to properly separate the two sets of data. If anyone can help me figure out this problem, it would be greatly appreciated. Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @K_Wils15 

 

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than  first/last.ID_NUM.

 

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;

View solution in original post

2 REPLIES 2
ed_sas_member
Meteorite | Level 14

Hi @K_Wils15 

 

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than  first/last.ID_NUM.

 

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;
SASKiwi
PROC Star

This should be closer to the right answer:

data CLAIMS1 Claims2;
       set CLAIMS;
       by ID_NUM CLAIM_NUM;
       if first.CLAIM_NUM and last.CLAIM_NUM then Output Claims1;
       else Output Claims2;
run;

You need to use CLAIM_NUM for your FIRST. and LAST. processing and by checking both you are outputting unique claims by ID_NUM. All other rows must then be not unique by claim.