Obsidian | Level 7

## PROC SQL UNION - unique vs with duplicates

Hi,

I have two tables A and B, both with the same columns:

```Z1 character7
Z2 character2
Z3 numeric8
Z4 Numeric8
Z5 character200
Z6 numeric8```

In A there are 1 121 776 rows

In B there are 114 028 rows.

I do this in a PROC SQL:

```		CREATE TABLE &output_table. AS
SELECT * FROM A
UNION SELECT * FROM B;```

My output table has... 333 456 rows...

Can someone explain me how it is possible???

1 ACCEPTED SOLUTION

Accepted Solutions
Ammonite | Level 13

## Re: PROC SQL UNION

By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.

Example from doc:

``````proc sql;
title 'A UNION ALL B';
select * from sql.a
union all
select * from sql.b;``````
4 REPLIES 4
Ammonite | Level 13

## Re: PROC SQL UNION

By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.

Example from doc:

``````proc sql;
title 'A UNION ALL B';
select * from sql.a
union all
select * from sql.b;``````
PROC Star

## Re: PROC SQL UNION

You have duplicates in both data sets.  The example below produces one observation even though A has 3 and B has 2:

``````data a b;
length z1 \$7 z2 \$2 z3 z4 8 z5 \$200 z6 8;
z1='1234567';
z2='12';
z3=1;
z4=2;
z5=repeat('abcde',40);
z6=3;

output a;output a; output a;
output b;output b;
run;

proc sql;
CREATE TABLE _table AS
SELECT * FROM A
UNION SELECT * FROM B;
quit;``````
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

## Re: PROC SQL UNION

UNION doesn't insert duplicate rows.

UNION ALL does insert duplicate rows.

### UNION

The UNION operator produces a table that contains all the unique rows that result from both table expressions
Super User

## Re: PROC SQL UNION

Or

data &output_table;

set a b;

run;

Which may run faster than the SQL.

Discussion stats
• 4 replies
• 104717 views
• 5 likes
• 5 in conversation