Re: Different Time Different Data

arnab61 · Posted 03-07-2012 01:31 AM

I am using a simple proc sql with a data set which contains millions of records.

The code is like

PROC SQL NOPRINT;

CREATE TABLE TABLE1

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I ran the code again with

PROC SQL NOPRINT;

CREATE TABLE TABLE2

SELECT * FROM DM.TABLE

WHERE COMPANY="xxx"

;

Now I am doing a proc compare without any by variable.

PROC COMPARE BASE=TABLE1 COMPARE=TABLE2;

Now proc compare compares the two data sets observation by observation or with a ID variable. ID variable is absent. What I think proc compare shold give exact matching. But though the rowcount is same it is giving some difference. When applied by variable on the primary key of the source data set,it is giving the exact match. Why it is happening can anybody help?

manojinpec · Posted 03-07-2012 03:56 AM

This is because how the proc compare works. If you given ID variable it will compare on the base of that ID. else each row will be compared with each in second table and difference is given.For more information you can read about proc compare in detail.

arnab61 · Posted 03-07-2012 06:10 AM

Even there is no ID variable then proc compare will compare each observation from the tables. So ideally as I am fetching the same value with the same filter condition, proc compare should return ideal match.

Doc_Duke · Posted 03-07-2012 11:30 AM

Another possibility is that the results of PROC SQL are in different order with different executions. SQL does not guarantee the output order to be the same each time; add the ORDER BY clause to get them in some sort of ID order. If DM.TABLE is on a database (rather than being a native SAS dataset), it may be doing the processing and you have no idea of the resulting data order.

Doc Muhlbaier

Duke

Astounding · Posted 03-07-2012 12:37 PM

Even if DM.TABLE were a SAS data set, the structure might have changed between runs. Here is a simple change:

data DM.TABLE;

set DM.TABLE;

run;

This step could cause SQL to retrieve records in a different order. How? Well, it is possible that the original data set had an index where COMPANY was the primary indexed variable. Running through the DATA step destroys the reliability of the index. So the first SQL uses the index and the second one doesn't. We need to know more about where the data comes from, and what changed in between the first and second SQL runs.

Astounding · Posted 03-07-2012 07:43 AM

This would happen if the source data (DM.TABLE) changed in between the two runs. It could be as simple as having run a PROC SORT in between.

sassharp · Posted 03-07-2012 05:36 PM

Proc compare compares each column from table 1 to table 2 in a row. For example if there is a rundate in table1 and rundate in table2 need not be same even you started to run the code same time to get table1 and table2.

For comparing two tables I am sure this is going to be usefulcode.

/* first outer join table1 and table2*/

Proc SQL;
Create table table1andtable2outerjoin as
Select a.id as id_1,
a.name as name_1,

b.id as id_2,
b.name as name_2

From table1 as a
Full Outer Join
table2 as b
on a.id = b.id and
a.name=b.name;
Quit;

/*to see rows only in table2 not in table1 you can add more columns for comparison*/

Data tablenew;
Source = 'two';
Length Source $5;
Set table1andtable2outerjoin ;
if missing(id_1) and missing(name_1)

then output;

Format Source $5.;
Informat Source $5.;
run;

/*to see rows only in table1not in table2 you can add more columns for comparison*/

Data tableold;
Source = 'one';
Length Source $5;
Set table1andtable2outerjoin;
if missing(id_2) and missing(name_2) then output;

Format Source $5.;
Informat Source $5.;
run;

/*to see rows side by side*/

Data tablenewold;
Set tablenew tableold;
run;

Different Time Different Data

Different Time Different Data

Different Time Different Data

Different Time Different Data

Different Time Different Data

Different Time Different Data

Re: Different Time Different Data

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away