- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have two tables A and B, both with the same columns:
Z1 character7 Z2 character2 Z3 numeric8 Z4 Numeric8 Z5 character200 Z6 numeric8
In A there are 1 121 776 rows
In B there are 114 028 rows.
I do this in a PROC SQL:
CREATE TABLE &output_table. AS SELECT * FROM A UNION SELECT * FROM B;
My output table has... 333 456 rows...
Can someone explain me how it is possible???
Thanks by advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.
Example from doc:
proc sql;
title 'A UNION ALL B';
select * from sql.a
union all
select * from sql.b;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.
Example from doc:
proc sql;
title 'A UNION ALL B';
select * from sql.a
union all
select * from sql.b;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have duplicates in both data sets. The example below produces one observation even though A has 3 and B has 2:
data a b;
length z1 $7 z2 $2 z3 z4 8 z5 $200 z6 8;
z1='1234567';
z2='12';
z3=1;
z4=2;
z5=repeat('abcde',40);
z6=3;
output a;output a; output a;
output b;output b;
run;
proc sql;
CREATE TABLE _table AS
SELECT * FROM A
UNION SELECT * FROM B;
quit;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
UNION doesn't insert duplicate rows.
UNION ALL does insert duplicate rows.
UNION
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Or
data &output_table;
set a b;
run;
Which may run faster than the SQL.