BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
David_Billa
Rhodochrosite | Level 12

I would like to compare two datasets and find the non matching records. In below example, both the datasets has same Observation and I believe only order of the records should be differ in both the datasets. When I run the following code, it says 4096 records are differ in both the datasets.

 

I would like to know whether proc compare will consider the order of the record also to be identical and I wanted to the best method to compare two datasets irrespective of order of the record. If there is a difference, I want to see the records which differs.

 

proc compare base=test compare=test1;
run;

%put &sysinfo.;

Log:

 

26         proc compare base=test compare=test1;
27         run;

NOTE: There were 5985917 observations read from the data set WORK.TEST.
NOTE: There were 5985917 observations read from the data set WORK.TEST1.
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
NOTE: PROCEDURE COMPARE used (Total process time):
      real time           28.39 seconds
      cpu time            27.54 seconds
      

28         
29         %put &sysinfo.;
SYMBOLGEN:  Macro variable SYSINFO resolves to 4096
1 ACCEPTED SOLUTION

Accepted Solutions
HIRO2020
Calcite | Level 5

How about the following

 

data a;
input ID var1 var2;
datalines;
001 1 2
002 2 2
003 1 2
004 1 2
006 1 3
;
run;

data b;
input ID var1 var2;
datalines;
001 1 2
002 1 2
003 1 2
004 1 3
007 2 2
;
run;

proc compare base=a comp=b noprint OUTNOEQUAL OUT=COMP ;
by ID;
run;

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Yes, PROC COMPARE considers observation order. Sort your two data sets by the same variable and compare again. In my book, PROC COMPARE is still the best tool for the job.

 

See this small example

 

data a;
input var1 var2;
datalines;
1 2
3 4
;

data b;
input var1 var2;
datalines;
3 4
1 2
;

proc compare base=a comp=b;
run;
David_Billa
Rhodochrosite | Level 12
OK, how and where can I difference records in the output?
HIRO2020
Calcite | Level 5

How about the following

 

data a;
input ID var1 var2;
datalines;
001 1 2
002 2 2
003 1 2
004 1 2
006 1 3
;
run;

data b;
input ID var1 var2;
datalines;
001 1 2
002 1 2
003 1 2
004 1 3
007 2 2
;
run;

proc compare base=a comp=b noprint OUTNOEQUAL OUT=COMP ;
by ID;
run;
David_Billa
Rhodochrosite | Level 12

I'm unable to understand the data in the Output dataset COMP.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2160 views
  • 2 likes
  • 4 in conversation