BookmarkSubscribeRSS Feed
Sandeep77
Lapis Lazuli | Level 10

Hi Experts,

I am trying to compare two excel files to find the different values. I have imported the files (File name Existing and Revised). I did a proc compare step but it shows 50 records and says the max limit has reached. I want the output to highlight only those rows in a table which has differences. The DM9_consumer_ID is same in both the files. Here is a sample dataset of one of the file. The headers are also same. I also tried to do a merge to find a difference but it did not produced any output, shows only the header. Could you please check and help the best way to compare the difference in the excel files?

Data Compare_files;
infile cards expandtabs;
input DM9_Consumer_ID Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment No_of_COVID_CBIL_RLS_Products;
datalines ;
6896 0 0 0 0 0 0 0
6895 6248.17 0 0 6248.17 0 6248.17 0
6899 0 0 0 0 0 0 0
6894 0 0 0 0 0 0 0
6893 0 0 0 0 0 0 0
14382 -10741.58 14094.8911 14094.8911 14094.8911 26645.8081 14094.8911 1
14598 -11981.12 17030.9439 17030.9439 17030.9439 32196.2944 17030.9439 1
14577 -47199.67 21795.8424 21795.8424 21795.8424 41204.1381 21795.8424 1
;
run;
Proc sort data=existing;
by DM9_Consumer_ID;
run;
Proc sort data=revised;
by DM9_Consumer_ID;
run;
Data differences;
merge existing (in=a) revised (in=b);
by _all_;
if a and b and (existing ne revised);
run;
Proc print data=differences;
run;
/* Comparing both the files */
Proc compare base= existing
compare=revised;
run;
6 REPLIES 6
andreas_lds
Jade | Level 19

Reading excel files is guesswork, because in excel the concept of typed columns does not exist in a reliable way.

If something does not work as expecting sharing the log is always a good idea, so that others know what actually happened.

If both datasets have all variables in common, a merge without renaming the variable in one of the datasets won't allow comparison of values.

The if statement

if a and b and (existing ne revised);

means: keep all observations being in both dataset where the variable existing and revised are not equal. Most likely not what your expected to happen.

Sandeep77
Lapis Lazuli | Level 10
Log:
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5 %LET _CLIENTPROJECTPATH='C:\Users\5604829\Desktop\Documents\Comparing files.egp';
6 %LET _CLIENTPROJECTPATHHOST='MMD5CG24212BN';
7 %LET _CLIENTPROJECTNAME='Comparing files.egp';
8 %LET _SASPROGRAMFILE='';
9 %LET _SASPROGRAMFILEHOST='';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=SVG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 %macro HTML5AccessibleGraphSupported;
15 %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16 %mend;
17 FILENAME EGHTML TEMP;
18 ODS HTML5(ID=EGHTML) FILE=EGHTML
19 OPTIONS(BITMAP_MODE='INLINE')
20 %HTML5AccessibleGraphSupported
21 ENCODING='utf-8'
22 STYLE=HTMLBlue
23 NOGTITLE
24 NOGFOOTNOTE
25 GPATH=&sasworklocation
26 ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 Proc sort data=existing;
29 by DM9_Consumer_ID;
30 run;

NOTE: There were 104291 observations read from the data set WORK.EXISTING.
NOTE: The data set WORK.EXISTING has 104291 observations and 81 variables.
NOTE: Compressing data set WORK.EXISTING decreased size by 46.38 percent.
Compressed is 571 pages; un-compressed would require 1065 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 1.47 seconds
user cpu time 0.68 seconds
system cpu time 0.26 seconds
memory 73047.96k
OS Memory 101816.00k
Timestamp 06/27/2024 12:52:44 PM
Step Count 11 Switch Count 5
Page Faults 1
Page Reclaims 26109
Page Swaps 0
Voluntary Context Switches 22
Involuntary Context Switches 186
Block Input Operations 0
Block Output Operations 0


31
32 Proc sort data=revised;
33 by DM9_Consumer_ID;
34 run;

2 The SAS System 10:52 Thursday, June 27, 2024

NOTE: There were 104291 observations read from the data set WORK.REVISED.
NOTE: The data set WORK.REVISED has 104291 observations and 81 variables.
NOTE: Compressing data set WORK.REVISED decreased size by 46.85 percent.
Compressed is 566 pages; un-compressed would require 1065 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 1.38 seconds
user cpu time 0.68 seconds
system cpu time 0.25 seconds
memory 73043.28k
OS Memory 101816.00k
Timestamp 06/27/2024 12:52:45 PM
Step Count 12 Switch Count 5
Page Faults 1
Page Reclaims 26690
Page Swaps 0
Voluntary Context Switches 25
Involuntary Context Switches 160
Block Input Operations 0
Block Output Operations 0


35
36 Data differences;
37 merge existing (in=a) revised (in=b);
38 by _all_;
39 if a and b and (existing ne revised);
40 run;

NOTE: Variable existing is uninitialized.
NOTE: Variable revised is uninitialized.
NOTE: There were 104291 observations read from the data set WORK.EXISTING.
NOTE: There were 104291 observations read from the data set WORK.REVISED.
NOTE: The data set WORK.DIFFERENCES has 0 observations and 83 variables.
NOTE: DATA statement used (Total process time):
real time 1.35 seconds
user cpu time 0.80 seconds
system cpu time 0.07 seconds
memory 2049.46k
OS Memory 31136.00k
Timestamp 06/27/2024 12:52:47 PM
Step Count 13 Switch Count 5
Page Faults 0
Page Reclaims 218
Page Swaps 0
Voluntary Context Switches 34
Involuntary Context Switches 106
Block Input Operations 0
Block Output Operations 0


41
42 Proc print data=differences;
43 run;

NOTE: No observations in data set WORK.DIFFERENCES.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
user cpu time 0.00 seconds
3 The SAS System 10:52 Thursday, June 27, 2024

system cpu time 0.00 seconds
memory 905.28k
OS Memory 31136.00k
Timestamp 06/27/2024 12:52:47 PM
Step Count 14 Switch Count 0
Page Faults 0
Page Reclaims 56
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 2
Block Input Operations 0
Block Output Operations 0


44
45 %LET _CLIENTTASKLABEL=;
46 %LET _CLIENTPROCESSFLOWNAME=;
47 %LET _CLIENTPROJECTPATH=;
48 %LET _CLIENTPROJECTPATHHOST=;
49 %LET _CLIENTPROJECTNAME=;
50 %LET _SASPROGRAMFILE=;
51 %LET _SASPROGRAMFILEHOST=;
52
53 ;*';*";*/;quit;run;
54 ODS _ALL_ CLOSE;
55
56
57 QUIT; RUN;
Kurt_Bremser
Super User
NOTE: Variable existing is uninitialized.
NOTE: Variable revised is uninitialized.

This tells you that these variables do not exist in the datasets and can therefore not be used to indicate anything.

Comparisons must be explicitly coded in a DATA step; arrays can help in this, but you must rename the common variables from one dataset to avoid the collision.

All this is why you're better off with PROC COMPARE.

Sandeep77
Lapis Lazuli | Level 10
Both datasets have all variables in common.
Kurt_Bremser
Super User

Maxim 2: Read the Log.

What does it say about this:

if a and b and (existing ne revised);

?

 

In PROC COMPARE, add

by dm9_consumer_id;

otherwise the procedure will compare observations sequentially as they come in, disregarding the key.

Sajid01
Meteorite | Level 14

Hello @Sandeep77 
You have mentioned "I did a proc compare step but it shows 50 records and says the max limit has reached."
This is the default and can be overridden with different otpions.
Have a look at the documentation here.
SAS Help Center: Syntax: PROC COMPARE PROC COMPARE Statement
In particular have a look at this example SAS Help Center: Comparing Values of Observations Using an Output Data Set (OUT=).

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 372 views
  • 2 likes
  • 4 in conversation