BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
K_Wils15
Obsidian | Level 7

Here is the Problem: 

  1. Assume you have a SAS data set (called CLAIMS) that contains pharmacy claims. Each claim should be uniquely identified by the patient's identification number (ID_NUM) and a transaction number (Claim_Num). (E.g., patient 001 should not have more than one record with Claim_Num = 123).  You want to be able to inspect any claims for which there are possible duplicate records.  Write the SAS code to separate the observations into 2 new SAS data sets.  One data set will hold all the observations where the combination of ID_NUM and Claim_Num are unduplicated (that is, there is only one claim with a particular ID_NUM and Claim_Num combination), and a second data set that will hold all the records (including the first) where there are duplicates of Claim_Num for a given ID_NUM.  (HINT: you will need to use first. and last. to do this).

Here is what I have:

 

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

 

data CLAIMS1 Claims2;
       set CLAIMS;
       by ID_NUM CLAIM_NUM;
       if first.ID_NUM = 1;
Output Claims1;
       if first.ID_NUM = 1 and last.ID_NUM = 1;
Output Claims2;
run;

 

I know this is wrong because I tried  a sample data set and there was no output for claims2. I think the error is first.id_num being = to 1 in both cases but I just can't think of how to properly separate the two sets of data. If anyone can help me figure out this problem, it would be greatly appreciated. Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @K_Wils15 

 

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than  first/last.ID_NUM.

 

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;

View solution in original post

2 REPLIES 2
ed_sas_member
Meteorite | Level 14

Hi @K_Wils15 

 

I think the problem comes from the by variable -> you need to take into account first/last.CLAIM_NUM rather than  first/last.ID_NUM.

 

data CLAIMS;
	input ID_NUM $ CLAIM_NUM;
	cards;
001 1
001 123
001 123
002 234
002 234
002 234
002 999
;
run;

proc sort data=CLAIMS;
       by ID_NUM CLAIM_NUM;
run;

data CLAIMS_NO_DUP CLAIMS2_WITH_DUP;
	set CLAIMS;
	by ID_NUM CLAIM_NUM;
	if first.CLAIM_NUM = 1 and last.CLAIM_NUM = 1 then Output CLAIMS_NO_DUP;
    else output CLAIMS2_WITH_DUP;
run;
SASKiwi
PROC Star

This should be closer to the right answer:

data CLAIMS1 Claims2;
       set CLAIMS;
       by ID_NUM CLAIM_NUM;
       if first.CLAIM_NUM and last.CLAIM_NUM then Output Claims1;
       else Output Claims2;
run;

You need to use CLAIM_NUM for your FIRST. and LAST. processing and by checking both you are outputting unique claims by ID_NUM. All other rows must then be not unique by claim.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 355 views
  • 1 like
  • 3 in conversation