Hi,
I am trying to write a program that is looking at a table with an id, date, provider number, and procedure number.
I want to write a program that identifies when there are rows that have the same (or duplicate) id AND date AND provider number, then output them into a new table.
Any idea how that may be possible?
It looks like you're referring to the alias have in your join condition, though your join uses the aliases fake_partB and c.
proc sql; 87 create table want as select fake_partB.*,c.count 88 from fake_partB left join _counts_ c 89 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi; ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 90 quit;
If you use have as the alias for fake_partb, that should work:
from fake_partB have left join _counts_ c
@justicetobias wrote:
Hi,
I am trying to write a program that is looking at a table with an id, date, provider number, and procedure number.
I want to write a program that identifies when there are rows that have the same id, date, and provider number and output them into a new table.
It would definitely help if you provided us a portion of your data (as working SAS data step code).
What should we do if it is not a duplicate? What about triplicates?
Why do you need them in a separate data set? That's usually not necessary and causes extra work.
Here is code (untested) that identifies duplicates, triplicates, etc.
proc freq data=have;
    table id*date*provider_number/noprint list out=_counts_;
run;
proc sql;
    create table want as select have.*,c.count
        from have left join _counts_ c
        on have.id=c.id and have.date=c.date and have.provider_number=c.provider_number;
quit;
Than you so much for your help.
A portion of the data would be:
ID Procedure Provider Date
| 845 | 1002 | 4444 | 12-Dec-19 | 
| 261 | 1005 | 4441 | 13-Dec-19 | 
| 326 | 1006 | 4445 | 12-Dec-19 | 
| 232 | 1004 | 4444 | 12-Dec-19 | 
| 252 | 1009 | 4449 | 13-Dec-19 | 
| 948 | 1003 | 4446 | 13-Dec-19 | 
| 675 | 1005 | 4447 | 13-Dec-19 | 
| 154 | 1001 | 4441 | 12-Dec-19 | 
| 852 | 1008 | 4447 | 12-Dec-19 | 
| 954 | 1006 | 4440 | 12-Dec-19 | 
| 232 | 1005 | 4444 | 12-Dec-19 | 
| 232 | 1005 | 4445 | 12-Dec-19 | 
| 154 | 1001 | 4441 | 13-Dec-19 | 
| 948 | 1002 | 4446 | 13-Dec-19 | 
| 675 | 1006 | 4447 | 13-Dec-19 | 
| 261 | 1006 | 4441 | 12-Dec-19 | 
I don't need a new output table, but the only rows I want are ones that have and type of repeat (duplicate, triplicate, etc.) with id, provider, and date.
The code you provider was super helpful and identified the correct rows! But instead of an output that told me the rows, I would like to only have those rows left in the table.
So when you go to use this table next, for example in PROC MEANS or whatever, you can use something like this
proc means data=want(where=(count>1));Thank you! do you know why these error messages would appear?
Show us the log for these steps, the code, the warning messages, the error messages (in other words, don't chop anything out) by copying the log as text (not a picture) and pasting it into the window that appears when you click on the </> icon.
... by copying the log as text (not a picture) and pasting it into the window that appears when you click on the </> icon.
You forgot that step.
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         proc import datafile  =  '/home/u49170471/sasuser.v94/fake_partB.xlsx'
 74          out  =  fake_partB
 75          dbms  =  xlsx
 76          replace
 77          ;
 78         run;
 
 NOTE: The import data set has 16 observations and 4 variables.
 NOTE: WORK.FAKE_PARTB data set was successfully created.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              2742.68k
       OS Memory           32168.00k
       Timestamp           06/17/2020 03:01:19 PM
       Step Count                        32  Switch Count  4
       Page Faults                       0
       Page Reclaims                     666
       Page Swaps                        0
       Voluntary Context Switches        30
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 79         
 80         
 81         
 82         proc freq data=fake_partB;
 83             table bene*srvc_dt*prvdr_npi/noprint list out=_counts_;
 84         run;
 
 NOTE: There were 16 observations read from the data set WORK.FAKE_PARTB.
 NOTE: The data set WORK._COUNTS_ has 13 observations and 5 variables.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1965.90k
       OS Memory           31676.00k
       Timestamp           06/17/2020 03:01:19 PM
       Step Count                        33  Switch Count  10
       Page Faults                       0
       Page Reclaims                     418
       Page Swaps                        0
       Voluntary Context Switches        45
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           1320
       
 
 85         
 86         proc sql;
 87             create table want as select fake_partB.*,c.count
 88                 from fake_partB left join _counts_ c
 89                 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi;
 ERROR: Unresolved reference to table/correlation name have.
 ERROR: Unresolved reference to table/correlation name have.
 ERROR: Unresolved reference to table/correlation name have.
 ERROR: Expression using equals (=) has components that are of different data types.
 ERROR: Expression using equals (=) has components that are of different data types.
 ERROR: Expression using equals (=) has components that are of different data types.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 90         quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              5862.12k
       OS Memory           35760.00k
       Timestamp           06/17/2020 03:01:19 PM
       Step Count                        34  Switch Count  0
       Page Faults                       0
       Page Reclaims                     89
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 91         
 92         
 93         
 94         
 95         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 107        
 
User: u49170471My apologies. Super new. Still figuring things out.
It looks like you're referring to the alias have in your join condition, though your join uses the aliases fake_partB and c.
proc sql; 87 create table want as select fake_partB.*,c.count 88 from fake_partB left join _counts_ c 89 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi; ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 90 quit;
If you use have as the alias for fake_partb, that should work:
from fake_partB have left join _counts_ c
Yes, @mklangley hits the nail on the head. You have to change all occurrences of HAVE to FAKE_PARTB
proc sort data = have; 
by id date provider_number;
data want;
set have;
by id date provider_number;
if first.id and last.id then do;
 end;
else do;
 output;
end;
run;
My bad the solution will be like
proc sort data = have; 
by id date provider_number;
data want;
set have;
by id date provider_number;
if first.provider_number and last.provider_number then do;
 end;
else do;
 output;
end;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
