DATA Step, Macro, Functions and more

Proc Compare

Accepted Solution Solved
Reply
Contributor
Posts: 69
Accepted Solution

Proc Compare

I am comparing two datasets, say A and B.

My SAS code is :

 

 

proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check ;
run;

LOG:

 

 

23         GOPTIONS ACCESSIBLE;
24         	
24       !  proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check  ;
25         
26         	run;

NOTE: There were 402612 observations read from the data set WORK.N_402612.
NOTE: There were 389132 observations read from the data set WORK.N_389132.
NOTE: The data set WORK.CHECK has 389132 observations and 3 variables.

Results:

 

 

                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare                                                                                                      
                                                                                                                                    
First Obs           1        1                                                                                                      
First Unequal       9        9                                                                                                      
Last  Unequal  389132   389132                                                                                                      
Last  Match    389132   389132                                                                                                      
Last  Obs      402612        .                                                                                                      
                                                                                                                                    
Number of Observations in Common: 389132.                                                                                           
Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.                                                            
Total Number of Observations Read from WORK.N_402612: 402612.                                                                       
Total Number of Observations Read from WORK.N_389132: 389132.                                                                       
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 389124.                                                                
Number of Observations with All Compared Variables Equal: 8.                                                                        
                                                                                                                                    
                                                                                                                                    
Values Comparison Summary                                                                                                           
                                                                                                                                    
Number of Variables Compared with All Observations Equal: 0.                                                                        
Number of Variables Compared with Some Observations Unequal: 1.                                                                     
Total Number of Values which Compare Unequal: 389124.                                                                               
                                                                                                                                    
                                                                                                                                    
All Variables Compared have Unequal Values                                                                                          
                                                                                                                                    
Variable  Type  Len   Label          Ndif   MaxDif                                                                                  
                                                                                                                                    
Receipt   CHAR   13   Recipt Number389124                                                                                           
                                                                                                                                    
 

In the output dataset Check, I want to keep only n=13480 records in my output dataset check. 

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.                                                            							

 


Accepted Solutions
Solution
‎08-25-2017 06:34 AM
Contributor
Posts: 69

Re: Proc Compare

Well, very good responses  recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and  Quentin.

 

Each of them shed very intereting insights and comments.

 

It was found that my dataset  A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A. 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

Thanks to all who have responded.

 

Girish Patel

 

View solution in original post


All Replies
Super User
Posts: 19,878

Re: Proc Compare

An SQL query to do that is likely easier than trying to get the numbers out of PROC COMPARE. Otherwise look at the OUTDIFF data set.

Super User
Posts: 19,878

Re: Proc Compare

Also, note that value is the difference, it did a line by line comparison when I suspect you're trying to check for a specific variable. You need to add that option to proc compare, I think it's a VAR statement or ID statement.
Contributor
Posts: 69

Re: Proc Compare

 

Thanks for your insight and suggested solution. Proc SQL and Data Merge doesn't give me correct/exact counts. I need to 

create a SAS dataset of n=13,480 records as outlined by Proc Compare results.

 

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.

  I hope somebody will shed some lights and offers desired solution.

Super User
Posts: 19,878

Re: Proc Compare

Your PROC COMPARE is wrong, so expecting that number in the first place is wrong.
PROC Star
Posts: 1,760

Re: Proc Compare

You want to only keep 10,000 records? Which ones?

How does proc compare know?

As @Reeza said, you probably need a key (an ID statement in proc compare, or another join like a proc sql) to give some sense to your comparison.

PROC Star
Posts: 1,760

Re: Proc Compare

Let's try another way.

What don't you just look at the first 13,480 records? or the the last 13,480 records?

Why not? How do *you* know these are not the ones you are after?

Regular Contributor
Posts: 227

Re: Proc Compare

these papers describes macros that may have hints to what you want to do.

 

http://www.sascommunity.org/wiki/Macro_ComparWS_Compare_with_summary

 

http://www.sascommunity.org/wiki/Macro_Extract

 

Ron Fehd  comparatively, a maven

Contributor
Posts: 69

Re: Proc Compare

I did try it. No luck.
Is there any way to isolate only n=13,480 records?
Super User
Posts: 19,878

Re: Proc Compare

Of course, but you need to define how, what are the rules that define what to be extracted. You haven't commented on the fact that the PROC COMPARE is wrong and that you don't know the difference. You don't know how many are in one and not the other, because you can have duplicates. If each data set has no duplicate records then you may be ok. 

 

If you have no duplicates in either data set this approach may work:

proc sort data=have1; by ID_VAR;
proc sort data=have2; by ID_VAR;

data both /*includes records from both datasets*/
        in1 /*records only in have1*/
       in2 /*records only in have2*/;
merge have1 (in=a)
           have2 (in=b);

by ID_VAR;

if a and b then output both;
if a and not b then output in1;
if b and not a then output in2;

run;



PROC Star
Posts: 1,325

Re: Proc Compare

[ Edited ]

Interesting.  There is a LISTBASEOBS options which will list the records in the base dataset only, but I don't see an OUTBASEOBS option that would output those records.  So I'm going to go out on a limb and say no, PROC COMPARE can't do it.

 

But do note the point made above by many folks.  Right now you are doing a one-to-one sequential comparison (first record of each dataset is compared, then second record of each, etc.)  When you compare N_402612 to N_389132, the 13,480 records reported as being in N_402612 are simply the last 13,480 records from N_402612 .

 

You are doing the equivalent of:

 

proc compare base=sashelp.class compare=sashelp.class(obs=10) ;
run;

 

Which will report a difference of 9 records.

 

If you really want the last 13,480 records from N_402612 you can get them like:

 

data want;
  set N_402612(firstobs=%eval(402612-13479));
run;

 

PROC Star
Posts: 1,760

Re: Proc Compare

Or maybe you just want:

proc sql;
select * from TAB1 
except 
select * from TAB2;
quit;

from the paucity of your explanations, it is hard to guess what you want.

 

 

Solution
‎08-25-2017 06:34 AM
Contributor
Posts: 69

Re: Proc Compare

Well, very good responses  recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and  Quentin.

 

Each of them shed very intereting insights and comments.

 

It was found that my dataset  A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A. 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

Thanks to all who have responded.

 

Girish Patel

 

Super User
Posts: 19,878

Re: Proc Compare


GPatel wrote:

 

 

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

 

 


That's not correct. You can use PROC COMPARE, but you were not using it correctly. You misinterpreted the results because you didn't have a BY or ID or VAR/WITH statements.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 274 views
  • 3 likes
  • 5 in conversation