BookmarkSubscribeRSS Feed
arnab61
Calcite | Level 5

I am using a simple proc sql with a data set which contains millions of records.

The code is like

PROC SQL NOPRINT;

CREATE TABLE TABLE1

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I ran the code again with

PROC SQL NOPRINT;

CREATE TABLE TABLE2

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I am doing a proc compare without any by variable.

PROC COMPARE BASE=TABLE1 COMPARE=TABLE2;

Now proc compare compares the two data sets observation by observation or with a ID variable. ID variable is absent. What I think proc compare shold give exact matching. But though the rowcount is same it is giving some difference. When applied by variable on the primary key of the source data set,it is giving the exact match. Why it is happening can anybody help?

6 REPLIES 6
manojinpec
Obsidian | Level 7

This is because how the proc compare works. If you given ID variable it will compare on the base of that ID. else each row will be compared with each in second table and difference is given.For more information you can read about proc compare in detail.

arnab61
Calcite | Level 5


Even there is no ID variable then proc compare will compare each observation from the tables. So ideally as I am fetching the same value with the same filter condition, proc compare should return ideal match.

Doc_Duke
Rhodochrosite | Level 12

Another possibility is that the results of PROC SQL are in different order with different executions.  SQL does not guarantee the output order to be the same each time; add the ORDER BY clause to get them in some sort of ID order.  If DM.TABLE is on a database (rather than being a native SAS dataset), it may be doing the processing and you have no idea of the resulting data order.

Doc Muhlbaier

Duke

Astounding
PROC Star

Even if DM.TABLE were a SAS data set, the structure might have changed between runs.  Here is a simple change:

data DM.TABLE;

set DM.TABLE;

run;

This step could cause SQL to retrieve records in a different order.  How?  Well, it is possible that the original data set had an index where COMPANY was the primary indexed variable.  Running through the DATA step destroys the reliability of the index.  So the first SQL uses the index and the second one doesn't.  We need to know more about where the data comes from, and what changed in between the first and second SQL runs.

Astounding
PROC Star

This would happen if the source data (DM.TABLE) changed in between the two runs.  It could be as simple as having run a PROC SORT in between.

sassharp
Calcite | Level 5

Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.

For comparing two tables I am sure this is going to be usefulcode.

/* first outer join table1 and table2*/

Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,

b.id as id_2,
b.name as name_2

From table1 as a
  Full Outer Join
  table2 as b
  on a.id = b.id and
     a.name=b.name;
  Quit;

/*to see rows only in table2 not in table1 you can add more columns for comparison*/

  Data tablenew;
  Source = 'two';
  Length Source $5;
  Set table1andtable2outerjoin ;
  if missing(id_1) and missing(name_1)

then output;

  Format Source $5.;
  Informat Source $5.;
       run;

/*to see rows only in table1not in table2 you can add more columns for comparison*/


Data tableold;
Source = 'one';
  Length Source $5;
  Set table1andtable2outerjoin;
  if missing(id_2) and missing(name_2)  then output;
 
  Format Source $5.;
  Informat Source $5.;
  run;

/*to see rows side by side*/

  Data tablenewold;
  Set tablenew tableold;
  run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1187 views
  • 6 likes
  • 5 in conversation