Hi,
I am trying to write a program that is looking at a table with an id, date, provider number, and procedure number.
I want to write a program that identifies when there are rows that have the same (or duplicate) id AND date AND provider number, then output them into a new table.
Any idea how that may be possible?
It looks like you're referring to the alias have in your join condition, though your join uses the aliases fake_partB and c.
proc sql; 87 create table want as select fake_partB.*,c.count 88 from fake_partB left join _counts_ c 89 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi; ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 90 quit;
If you use have as the alias for fake_partb, that should work:
from fake_partB have left join _counts_ c
@justicetobias wrote:
Hi,
I am trying to write a program that is looking at a table with an id, date, provider number, and procedure number.
I want to write a program that identifies when there are rows that have the same id, date, and provider number and output them into a new table.
It would definitely help if you provided us a portion of your data (as working SAS data step code).
What should we do if it is not a duplicate? What about triplicates?
Why do you need them in a separate data set? That's usually not necessary and causes extra work.
Here is code (untested) that identifies duplicates, triplicates, etc.
proc freq data=have;
table id*date*provider_number/noprint list out=_counts_;
run;
proc sql;
create table want as select have.*,c.count
from have left join _counts_ c
on have.id=c.id and have.date=c.date and have.provider_number=c.provider_number;
quit;
Than you so much for your help.
A portion of the data would be:
ID Procedure Provider Date
845 | 1002 | 4444 | 12-Dec-19 |
261 | 1005 | 4441 | 13-Dec-19 |
326 | 1006 | 4445 | 12-Dec-19 |
232 | 1004 | 4444 | 12-Dec-19 |
252 | 1009 | 4449 | 13-Dec-19 |
948 | 1003 | 4446 | 13-Dec-19 |
675 | 1005 | 4447 | 13-Dec-19 |
154 | 1001 | 4441 | 12-Dec-19 |
852 | 1008 | 4447 | 12-Dec-19 |
954 | 1006 | 4440 | 12-Dec-19 |
232 | 1005 | 4444 | 12-Dec-19 |
232 | 1005 | 4445 | 12-Dec-19 |
154 | 1001 | 4441 | 13-Dec-19 |
948 | 1002 | 4446 | 13-Dec-19 |
675 | 1006 | 4447 | 13-Dec-19 |
261 | 1006 | 4441 | 12-Dec-19 |
I don't need a new output table, but the only rows I want are ones that have and type of repeat (duplicate, triplicate, etc.) with id, provider, and date.
The code you provider was super helpful and identified the correct rows! But instead of an output that told me the rows, I would like to only have those rows left in the table.
So when you go to use this table next, for example in PROC MEANS or whatever, you can use something like this
proc means data=want(where=(count>1));
Thank you! do you know why these error messages would appear?
Show us the log for these steps, the code, the warning messages, the error messages (in other words, don't chop anything out) by copying the log as text (not a picture) and pasting it into the window that appears when you click on the </> icon.
... by copying the log as text (not a picture) and pasting it into the window that appears when you click on the </> icon.
You forgot that step.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 proc import datafile = '/home/u49170471/sasuser.v94/fake_partB.xlsx' 74 out = fake_partB 75 dbms = xlsx 76 replace 77 ; 78 run; NOTE: The import data set has 16 observations and 4 variables. NOTE: WORK.FAKE_PARTB data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 2742.68k OS Memory 32168.00k Timestamp 06/17/2020 03:01:19 PM Step Count 32 Switch Count 4 Page Faults 0 Page Reclaims 666 Page Swaps 0 Voluntary Context Switches 30 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 79 80 81 82 proc freq data=fake_partB; 83 table bene*srvc_dt*prvdr_npi/noprint list out=_counts_; 84 run; NOTE: There were 16 observations read from the data set WORK.FAKE_PARTB. NOTE: The data set WORK._COUNTS_ has 13 observations and 5 variables. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 1965.90k OS Memory 31676.00k Timestamp 06/17/2020 03:01:19 PM Step Count 33 Switch Count 10 Page Faults 0 Page Reclaims 418 Page Swaps 0 Voluntary Context Switches 45 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 1320 85 86 proc sql; 87 create table want as select fake_partB.*,c.count 88 from fake_partB left join _counts_ c 89 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi; ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 90 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 5862.12k OS Memory 35760.00k Timestamp 06/17/2020 03:01:19 PM Step Count 34 Switch Count 0 Page Faults 0 Page Reclaims 89 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 91 92 93 94 95 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 107 User: u49170471
My apologies. Super new. Still figuring things out.
It looks like you're referring to the alias have in your join condition, though your join uses the aliases fake_partB and c.
proc sql; 87 create table want as select fake_partB.*,c.count 88 from fake_partB left join _counts_ c 89 on have.bene=c.bene and have.srvc_dt=c.srvc_dt and have.prvdr_npi=c.prvdr_npi; ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Unresolved reference to table/correlation name have. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. ERROR: Expression using equals (=) has components that are of different data types. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 90 quit;
If you use have as the alias for fake_partb, that should work:
from fake_partB have left join _counts_ c
Yes, @mklangley hits the nail on the head. You have to change all occurrences of HAVE to FAKE_PARTB
proc sort data = have;
by id date provider_number;
data want;
set have;
by id date provider_number;
if first.id and last.id then do;
end;
else do;
output;
end;
run;
My bad the solution will be like
proc sort data = have;
by id date provider_number;
data want;
set have;
by id date provider_number;
if first.provider_number and last.provider_number then do;
end;
else do;
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.