- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am comparing two datasets, say A and B.
My SAS code is :
proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check ; run;
LOG:
23 GOPTIONS ACCESSIBLE; 24 24 ! proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check ; 25 26 run; NOTE: There were 402612 observations read from the data set WORK.N_402612. NOTE: There were 389132 observations read from the data set WORK.N_389132. NOTE: The data set WORK.CHECK has 389132 observations and 3 variables.
Results:
Observation Summary Observation Base Compare First Obs 1 1 First Unequal 9 9 Last Unequal 389132 389132 Last Match 389132 389132 Last Obs 402612 . Number of Observations in Common: 389132. Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480. Total Number of Observations Read from WORK.N_402612: 402612. Total Number of Observations Read from WORK.N_389132: 389132. Number of Observations with Some Compared Variables Unequal: 389124. Number of Observations with All Compared Variables Equal: 8. Values Comparison Summary Number of Variables Compared with All Observations Equal: 0. Number of Variables Compared with Some Observations Unequal: 1. Total Number of Values which Compare Unequal: 389124. All Variables Compared have Unequal Values Variable Type Len Label Ndif MaxDif Receipt CHAR 13 Recipt Number389124
In the output dataset Check, I want to keep only n=13480 records in my output dataset check.
Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, very good responses recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and Quentin.
Each of them shed very intereting insights and comments.
It was found that my dataset A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A.
So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.
Thanks to all who have responded.
Girish Patel
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
An SQL query to do that is likely easier than trying to get the numbers out of PROC COMPARE. Otherwise look at the OUTDIFF data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your insight and suggested solution. Proc SQL and Data Merge doesn't give me correct/exact counts. I need to
create a SAS dataset of n=13,480 records as outlined by Proc Compare results.
Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.
I hope somebody will shed some lights and offers desired solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want to only keep 10,000 records? Which ones?
How does proc compare know?
As @Reeza said, you probably need a key (an ID statement in proc compare, or another join like a proc sql) to give some sense to your comparison.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's try another way.
What don't you just look at the first 13,480 records? or the the last 13,480 records?
Why not? How do *you* know these are not the ones you are after?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
these papers describes macros that may have hints to what you want to do.
http://www.sascommunity.org/wiki/Macro_ComparWS_Compare_with_summary
http://www.sascommunity.org/wiki/Macro_Extract
Ron Fehd comparatively, a maven
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there any way to isolate only n=13,480 records?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Of course, but you need to define how, what are the rules that define what to be extracted. You haven't commented on the fact that the PROC COMPARE is wrong and that you don't know the difference. You don't know how many are in one and not the other, because you can have duplicates. If each data set has no duplicate records then you may be ok.
If you have no duplicates in either data set this approach may work:
proc sort data=have1; by ID_VAR;
proc sort data=have2; by ID_VAR;
data both /*includes records from both datasets*/
in1 /*records only in have1*/
in2 /*records only in have2*/;
merge have1 (in=a)
have2 (in=b);
by ID_VAR;
if a and b then output both;
if a and not b then output in1;
if b and not a then output in2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Interesting. There is a LISTBASEOBS options which will list the records in the base dataset only, but I don't see an OUTBASEOBS option that would output those records. So I'm going to go out on a limb and say no, PROC COMPARE can't do it.
But do note the point made above by many folks. Right now you are doing a one-to-one sequential comparison (first record of each dataset is compared, then second record of each, etc.) When you compare N_402612 to N_389132, the 13,480 records reported as being in N_402612 are simply the last 13,480 records from N_402612 .
You are doing the equivalent of:
proc compare base=sashelp.class compare=sashelp.class(obs=10) ;
run;
Which will report a difference of 9 records.
If you really want the last 13,480 records from N_402612 you can get them like:
data want;
set N_402612(firstobs=%eval(402612-13479));
run;
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or maybe you just want:
proc sql;
select * from TAB1
except
select * from TAB2;
quit;
from the paucity of your explanations, it is hard to guess what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, very good responses recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and Quentin.
Each of them shed very intereting insights and comments.
It was found that my dataset A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A.
So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.
Thanks to all who have responded.
Girish Patel
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@GPatel wrote:
So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.
That's not correct. You can use PROC COMPARE, but you were not using it correctly. You misinterpreted the results because you didn't have a BY or ID or VAR/WITH statements.