Hi Experts,
I am trying to compare two excel files to find the different values. I have imported the files (File name Existing and Revised). I did a proc compare step but it shows 50 records and says the max limit has reached. I want the output to highlight only those rows in a table which has differences. The DM9_consumer_ID is same in both the files. Here is a sample dataset of one of the file. The headers are also same. I also tried to do a merge to find a difference but it did not produced any output, shows only the header. Could you please check and help the best way to compare the difference in the excel files?
Proc sort data=existing;
by DM9_Consumer_ID;
run;
Proc sort data=revised;
by DM9_Consumer_ID;
run;
Data differences;
merge existing (in=a) revised (in=b);
by _all_;
if a and b and (existing ne revised);
run;
Proc print data=differences;
run;
/* Comparing both the files */
Proc compare base= existing
compare=revised;
run;
Reading excel files is guesswork, because in excel the concept of typed columns does not exist in a reliable way.
If something does not work as expecting sharing the log is always a good idea, so that others know what actually happened.
If both datasets have all variables in common, a merge without renaming the variable in one of the datasets won't allow comparison of values.
The if statement
if a and b and (existing ne revised);
means: keep all observations being in both dataset where the variable existing and revised are not equal. Most likely not what your expected to happen.
NOTE: Variable existing is uninitialized. NOTE: Variable revised is uninitialized.
This tells you that these variables do not exist in the datasets and can therefore not be used to indicate anything.
Comparisons must be explicitly coded in a DATA step; arrays can help in this, but you must rename the common variables from one dataset to avoid the collision.
All this is why you're better off with PROC COMPARE.
Maxim 2: Read the Log.
What does it say about this:
if a and b and (existing ne revised);
?
In PROC COMPARE, add
by dm9_consumer_id;
otherwise the procedure will compare observations sequentially as they come in, disregarding the key.
Hello @Sandeep77
You have mentioned "I did a proc compare step but it shows 50 records and says the max limit has reached."
This is the default and can be overridden with different otpions.
Have a look at the documentation here.
SAS Help Center: Syntax: PROC COMPARE PROC COMPARE Statement
In particular have a look at this example SAS Help Center: Comparing Values of Observations Using an Output Data Set (OUT=).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.