Desktop productivity for business analysts and programmers

Different Time Different Data

Reply
Occasional Contributor
Posts: 15

Different Time Different Data

I am using a simple proc sql with a data set which contains millions of records.

The code is like

PROC SQL NOPRINT;

CREATE TABLE TABLE1

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I ran the code again with

PROC SQL NOPRINT;

CREATE TABLE TABLE2

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I am doing a proc compare without any by variable.

PROC COMPARE BASE=TABLE1 COMPARE=TABLE2;

Now proc compare compares the two data sets observation by observation or with a ID variable. ID variable is absent. What I think proc compare shold give exact matching. But though the rowcount is same it is giving some difference. When applied by variable on the primary key of the source data set,it is giving the exact match. Why it is happening can anybody help?

Frequent Contributor
Posts: 139

Different Time Different Data

This is because how the proc compare works. If you given ID variable it will compare on the base of that ID. else each row will be compared with each in second table and difference is given.For more information you can read about proc compare in detail.

Occasional Contributor
Posts: 15

Different Time Different Data


Even there is no ID variable then proc compare will compare each observation from the tables. So ideally as I am fetching the same value with the same filter condition, proc compare should return ideal match.

Trusted Advisor
Posts: 2,114

Different Time Different Data

Another possibility is that the results of PROC SQL are in different order with different executions.  SQL does not guarantee the output order to be the same each time; add the ORDER BY clause to get them in some sort of ID order.  If DM.TABLE is on a database (rather than being a native SAS dataset), it may be doing the processing and you have no idea of the resulting data order.

Doc Muhlbaier

Duke

Super User
Posts: 5,368

Different Time Different Data

Even if DM.TABLE were a SAS data set, the structure might have changed between runs.  Here is a simple change:

data DM.TABLE;

set DM.TABLE;

run;

This step could cause SQL to retrieve records in a different order.  How?  Well, it is possible that the original data set had an index where COMPANY was the primary indexed variable.  Running through the DATA step destroys the reliability of the index.  So the first SQL uses the index and the second one doesn't.  We need to know more about where the data comes from, and what changed in between the first and second SQL runs.

Super User
Posts: 5,368

Different Time Different Data

This would happen if the source data (DM.TABLE) changed in between the two runs.  It could be as simple as having run a PROC SORT in between.

Frequent Contributor
Posts: 110

Re: Different Time Different Data

Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.

For comparing two tables I am sure this is going to be usefulcode.

/* first outer join table1 and table2*/

Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,

b.id as id_2,
b.name as name_2

From table1 as a
  Full Outer Join
  table2 as b
  on a.id = b.id and
     a.name=b.name;
  Quit;

/*to see rows only in table2 not in table1 you can add more columns for comparison*/

  Data tablenew;
  Source = 'two';
  Length Source $5;
  Set table1andtable2outerjoin ;
  if missing(id_1) and missing(name_1)

then output;

  Format Source $5.;
  Informat Source $5.;
       run;

/*to see rows only in table1not in table2 you can add more columns for comparison*/


Data tableold;
Source = 'one';
  Length Source $5;
  Set table1andtable2outerjoin;
  if missing(id_2) and missing(name_2)  then output;
 
  Format Source $5.;
  Informat Source $5.;
  run;

/*to see rows side by side*/

  Data tablenewold;
  Set tablenew tableold;
  run;

Ask a Question
Discussion stats
  • 6 replies
  • 554 views
  • 6 likes
  • 5 in conversation