BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I performed a UNION on two data sets using Proc SQL. Instead of removing duplicates, it performed a UNION ALL and kept duplicates. The Proc SQL documentation states that UNION should only return distinct values. Why isn't it working as documented?

 

 

proc sql;
create table full_customers as 
select * from spring_customers
UNION
select * from fall_customers;
QUIT;

spring_customers and fall_customers are just tables I got from previous data steps.

data spring_customers;
set spring;
run;

data fall_customers;
set fall;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

 

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

 

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

View solution in original post

1 REPLY 1
ballardw
Super User

Are there variables in the data sets that would have different values, such as may be a date of service? That would make the records different. You may need to subset variables from each table depending on what you want.

 

Consider:

data work.example1;
   id = 3;
   date='01Jan2019'd;
run;

data work.example2;
   id = 3;
   date='01Aug2019'd;
run;


proc sql;
   create table work.full as
   select * from work.example1
   union
   select * from work.example2
   ;
quit;

proc sql;
   create table work.id as
   select id from work.example1
   union
   select id from work.example2
   ;
quit;

The second proc sql only returns one record because the only variables associated with the Union are the ID values, which are duplicated. The first returns both records because the value of the DATE variable varies between the two records and using the Select * says to consider all the variables.

 

You may also have to consider the CORR option if the order of variables varies between the two:

data work.example3;
   date='01Aug2019'd;
   id = 3;
run;

proc sql;
   create table work.full2 as
   select * from work.example2
   union
   select * from work.example3
   ;
quit;
proc sql;
   create table work.full3 as
   select * from work.example2
   union corr
   select * from work.example3
   ;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 8337 views
  • 0 likes
  • 2 in conversation