I have code similar to the following. I am trying to vertically join/concatenate about 370 tables.
From what i have read, the sql Outer Union Corr is equivalent to the data steps set statement.
However when i run the code, to join the same tables using the 2 different methods, the results differ.
The proc sql outer union corr results in roughly 100 more obs than the data step. (out of roughly 117,000).
is there an aspect I am overlooking with etiher approach?
*DATA STEP;
data Temp1;
set lib1.Table_02
lib1.Table_03
..
lib1.Table_10;
run;
data Temp2;
set Lib2.Table_02
Lib2.Table_03
..
lib2.Table_10;
run;
...
data temp40;
set lib15.Table_02
lib15.Table_03
..
lib15.Table_10;
data TempALL;
set Temp1
Temp2
Temp3
..
Temp40;
run;
*PROC SQL;
proc sql noprint;
create table Temp1 as
select * from
Lib1.Table_02 OUTER UNION CORR select * from
Lib1.Table_03 OUTER UNION CORR select * from
..
Lib1.Table_100;
quit;
proc sql noprint;
create table Temp2 as
select * from
Lib2.Table_02 OUTER UNION CORR select * from
Lib2.Table_03 OUTER UNION CORR select * from
..
Lib2.Table_100;
quit;
proc sql noprint;
create table Temp3 as
select * from
Lib3.Table_02 OUTER UNION CORR select * from
Lib3.Table_03 OUTER UNION CORR select * from
..
Lib3.Table_100;
quit;
proc sql noprint;
create table Temp4 as
select * from
Lib4.Table_02 OUTER UNION CORR select * from
Lib4.Table_03 OUTER UNION CORR select * from
..
Lib4.Table_100;
quit;
proc sql noprint;
create table TempAll as
select * from
Temp1 OUTER UNION CORR select * from
Temp2 OUTER UNION CORR select * from
Temp3 OUTER UNION CORR select * from
Temp4;
quit;
I tested with 100 datasets using a DATA step concatenation and SQL OUTER UNION CORR. The resulting datasets were identical. Can you give more detail on what you're doing? Here's my test code:
options dlcreatedir ;
libname lib1 "%qsysfunc(pathname(work))\lib1";
options nodlcreatedir nomprint nomlogic nosymbolgen;
%macro MakeData(lib,num);
proc datasets library=&lib kill nolist nodetails;
run; quit;
%do i=1 %to #
%let n=%unquote(%putn(&i,z3.));
data lib1.Table_&n;
retain Table "Table_&n";
do ID_&n =1 to 5;
output;
end;
run;
%end;
%mend;
%macro ConcatenateDS(lib,num);
data work.tempAll_ds;
set
%do i=1 %to #
%let n=%unquote(%putn(&i,z3.));
lib1.Table_&n
%end;
;
run;
%end;
%mend;
%macro ConcatenateSQL(lib,num);
proc sql;
create table work.tempALL_sql as
%do i=1 %to %eval(&num-1);
%let n=%unquote(%putn(&i,z3.));
select *
from lib1.Table_&n
OUTER UNION CORR
%end;
%let n=%unquote(%putn(&i,z3.));
select *
from lib1.Table_&n
;
quit;
%mend;
%let no=100;
%makedata(lib1,&no)
%ConcatenateDS(lib1,&no)
%ConcatenateSQL(lib1,&no)
proc compare base=work.tempALL_ds
compare=work.tempALL_sql;
run;
And the results:
The SAS System |
The COMPARE Procedure Comparison of WORK.TEMPALL_DS with WORK.TEMPALL_SQL (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.TEMPALL_DS 11MAR22:15:50:50 11MAR22:15:50:50 101 500 WORK.TEMPALL_SQL 11MAR22:15:50:50 11MAR22:15:50:50 101 500 Variables Summary Number of Variables in Common: 101. |
Observation Summary Observation Base Compare First Obs 1 1 Last Obs 500 500 Number of Observations in Common: 500. Total Number of Observations Read from WORK.TEMPALL_DS: 500. Total Number of Observations Read from WORK.TEMPALL_SQL: 500. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 500. NOTE: No unequal values were found. All values compared are exactly equal. |
Log please, including the exact code that produce log messages.
This note has nothing to do with the actual question you pose. But here are two comments that may substantially improve efficiency:
Putting these two suggestions together, you would have:
data VTemp1 / view=vtemp1;
set lib1.Table_02
lib1.Table_03
..
lib1.Table_10
open=defer;
run;
data VTemp2 / view=vtemp2;
set Lib2.Table_02
Lib2.Table_03
..
lib2.Table_10
open=defer;
run;
...
data Vtemp40 / view=vtemp40;
set lib15.Table_02
lib15.Table_03
..
lib15.Table_10
open=defer;
data TempALL;
set Temp1
Temp2
Temp3
..
Temp40
open=defer;
run;
Of course, if all 370 datasets have identical variables, you don't need to create intermediate data sets (TEMP1, TEMP2, ...). Instead you could
data TempALL;
set lib1.Table_02
lib1.Table_03
..
lib1.Table_10
Lib2.Table_02
Lib2.Table_03
..
lib2.Table_10
..
lib15.Table_02
lib15.Table_03
..
lib15.Table_10
open=defer;
run;
My apologies to everyone. It turned out to be a case of user error.
One of the tables was being added twice, but it somehow had gotten moved off to the right by about 20 Tabs, so it was far out of view, but still held the correct syntax. So in the dozens of times i looked through the list of tables, i never saw it until I did a Ctrl+F search for it.
At which point my wife comes running into my office to see what all the cursing was about.
Again. apologies.
There are some differences between the SQL unions and the data step SET statements when the structures are different.
The SQL code will set the length of character variables to the maximum length it has on any of the input datasets.
The SET statement will set the length of character variables to the length it has on the first dataset where it appears.
There is a limit on the number of datasets you can combine in a single SQL statement. I don't think there is a similar limit on the number of datasets you can reference in a SET statement.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.