BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GPatel
Pyrite | Level 9

I am comparing two datasets, say A and B.

My SAS code is :

 

 

proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check ;
run;

LOG:

 

 

23         GOPTIONS ACCESSIBLE;
24         	
24       !  proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check  ;
25         
26         	run;

NOTE: There were 402612 observations read from the data set WORK.N_402612.
NOTE: There were 389132 observations read from the data set WORK.N_389132.
NOTE: The data set WORK.CHECK has 389132 observations and 3 variables.

Results:

 

 

                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare                                                                                                      
                                                                                                                                    
First Obs           1        1                                                                                                      
First Unequal       9        9                                                                                                      
Last  Unequal  389132   389132                                                                                                      
Last  Match    389132   389132                                                                                                      
Last  Obs      402612        .                                                                                                      
                                                                                                                                    
Number of Observations in Common: 389132.                                                                                           
Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.                                                            
Total Number of Observations Read from WORK.N_402612: 402612.                                                                       
Total Number of Observations Read from WORK.N_389132: 389132.                                                                       
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 389124.                                                                
Number of Observations with All Compared Variables Equal: 8.                                                                        
                                                                                                                                    
                                                                                                                                    
Values Comparison Summary                                                                                                           
                                                                                                                                    
Number of Variables Compared with All Observations Equal: 0.                                                                        
Number of Variables Compared with Some Observations Unequal: 1.                                                                     
Total Number of Values which Compare Unequal: 389124.                                                                               
                                                                                                                                    
                                                                                                                                    
All Variables Compared have Unequal Values                                                                                          
                                                                                                                                    
Variable  Type  Len   Label          Ndif   MaxDif                                                                                  
                                                                                                                                    
Receipt   CHAR   13   Recipt Number389124                                                                                           
                                                                                                                                    
 

In the output dataset Check, I want to keep only n=13480 records in my output dataset check. 

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.                                                            							

 

1 ACCEPTED SOLUTION

Accepted Solutions
GPatel
Pyrite | Level 9

Well, very good responses  recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and  Quentin.

 

Each of them shed very intereting insights and comments.

 

It was found that my dataset  A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A. 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

Thanks to all who have responded.

 

Girish Patel

 

View solution in original post

13 REPLIES 13
Reeza
Super User

An SQL query to do that is likely easier than trying to get the numbers out of PROC COMPARE. Otherwise look at the OUTDIFF data set.

Reeza
Super User
Also, note that value is the difference, it did a line by line comparison when I suspect you're trying to check for a specific variable. You need to add that option to proc compare, I think it's a VAR statement or ID statement.
GPatel
Pyrite | Level 9

 

Thanks for your insight and suggested solution. Proc SQL and Data Merge doesn't give me correct/exact counts. I need to 

create a SAS dataset of n=13,480 records as outlined by Proc Compare results.

 

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.

  I hope somebody will shed some lights and offers desired solution.

Reeza
Super User
Your PROC COMPARE is wrong, so expecting that number in the first place is wrong.
ChrisNZ
Tourmaline | Level 20

You want to only keep 10,000 records? Which ones?

How does proc compare know?

As @Reeza said, you probably need a key (an ID statement in proc compare, or another join like a proc sql) to give some sense to your comparison.

ChrisNZ
Tourmaline | Level 20

Let's try another way.

What don't you just look at the first 13,480 records? or the the last 13,480 records?

Why not? How do *you* know these are not the ones you are after?

Ron_MacroMaven
Lapis Lazuli | Level 10

these papers describes macros that may have hints to what you want to do.

 

http://www.sascommunity.org/wiki/Macro_ComparWS_Compare_with_summary

 

http://www.sascommunity.org/wiki/Macro_Extract

 

Ron Fehd  comparatively, a maven

GPatel
Pyrite | Level 9
I did try it. No luck.
Is there any way to isolate only n=13,480 records?
Reeza
Super User

Of course, but you need to define how, what are the rules that define what to be extracted. You haven't commented on the fact that the PROC COMPARE is wrong and that you don't know the difference. You don't know how many are in one and not the other, because you can have duplicates. If each data set has no duplicate records then you may be ok. 

 

If you have no duplicates in either data set this approach may work:

proc sort data=have1; by ID_VAR;
proc sort data=have2; by ID_VAR;

data both /*includes records from both datasets*/
        in1 /*records only in have1*/
       in2 /*records only in have2*/;
merge have1 (in=a)
           have2 (in=b);

by ID_VAR;

if a and b then output both;
if a and not b then output in1;
if b and not a then output in2;

run;



Quentin
Super User

Interesting.  There is a LISTBASEOBS options which will list the records in the base dataset only, but I don't see an OUTBASEOBS option that would output those records.  So I'm going to go out on a limb and say no, PROC COMPARE can't do it.

 

But do note the point made above by many folks.  Right now you are doing a one-to-one sequential comparison (first record of each dataset is compared, then second record of each, etc.)  When you compare N_402612 to N_389132, the 13,480 records reported as being in N_402612 are simply the last 13,480 records from N_402612 .

 

You are doing the equivalent of:

 

proc compare base=sashelp.class compare=sashelp.class(obs=10) ;
run;

 

Which will report a difference of 9 records.

 

If you really want the last 13,480 records from N_402612 you can get them like:

 

data want;
  set N_402612(firstobs=%eval(402612-13479));
run;

 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ChrisNZ
Tourmaline | Level 20

Or maybe you just want:

proc sql;
select * from TAB1 
except 
select * from TAB2;
quit;

from the paucity of your explanations, it is hard to guess what you want.

 

 

GPatel
Pyrite | Level 9

Well, very good responses  recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and  Quentin.

 

Each of them shed very intereting insights and comments.

 

It was found that my dataset  A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A. 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

Thanks to all who have responded.

 

Girish Patel

 

Reeza
Super User

@GPatel wrote:

 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

 


That's not correct. You can use PROC COMPARE, but you were not using it correctly. You misinterpreted the results because you didn't have a BY or ID or VAR/WITH statements.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2732 views
  • 3 likes
  • 5 in conversation