BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
arielcslin
Calcite | Level 5

Hi

 

I am looking to merge datasets together however I only want to keep the common variables between the 2 datasets. Is there a way to do this without manually writing out all the variables to be dropped? For example (below) I am trying to merge Dataset 1 and Dataset 2 to make Dataset 3. I am aware that Dataset 2's data will overwrite Dataset 1 and I am ok with this because Dataset 1 is acting like a template.

 

Dataset 1

A  B  C

0   0   0

 

Dataset 2

 B  A 

 4   1

 5   2

 6   3

 

Dataset 3

 A  B   

 1   4

 2   5

 3   6

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

safer with 

 

union corr all

 



 data one;
 input A  B  C;
 cards;
0   0   0
;

data two;
input B  A ;
cards;
4   1
5   2
6   3
;

proc sql;
create table want as
select * from one(obs=0)
union corr all
select * from two;
quit;

View solution in original post

7 REPLIES 7
r_behata
Barite | Level 11

Are you looking to reorder the Variables in the final dataset ? If that is the only case then there are better options than merging.  Merging the two data sets does not make any sense in your example.

 

data have;
input B A;
cards;
4 1
5 2
6 3
;
run;

data want;
retain a b;
set have;
run;
arielcslin
Calcite | Level 5

Yes the point is to order Dataset 2 however Dataset 2 may not have all the variables listed in the template(?) So I would like to put Dataset 2 in order but drop all variables that are not populated with any data

r_behata
Barite | Level 11
Can you give a better example that reflect the scenario more accurately.
novinosrin
Tourmaline | Level 20

 data one;
 input A  B  C;
 cards;
0   0   0
;

data two;
input B  A ;
cards;
4   1
5   2
6   3
;

proc sql;
create table want as
select * from one(obs=0)
union corr
select * from two;
quit;
Tom
Super User Tom
Super User

The UNION CORR ALL method is probably the best.  (without the ALL keyword UNION will remove duplicate rows).

If you try to do it using a data step then you can get metadata attributes like FORMAT settings overriding the attributes set in the template.

data template;
 input A B C;
cards;
;

data sample;
  input B  A ;
  format a 4.  ;
cards;
4   1
5   2
6   3
;

proc transpose data=sample (obs=0) out=names ; var _all_; run;
proc sql noprint ;
  select _name_ into :names separated by ' ' from names ;
quit;

data almost;
  set template(keep=&names) sample;
run;

proc sql noprint;
create table want as
  select * from template
  union corr all
  select * from sample
;
describe table almost ;
describe table want ;
quit;
144  describe table almost;
NOTE: SQL table WORK.ALMOST was created like:

create table WORK.ALMOST( bufsize=65536 )
  (
   A num format=4.,
   B num
  );

145  describe table want ;
NOTE: SQL table WORK.WANT was created like:

create table WORK.WANT( bufsize=65536 )
  (
   A num,
   B num
  );

 

novinosrin
Tourmaline | Level 20

safer with 

 

union corr all

 



 data one;
 input A  B  C;
 cards;
0   0   0
;

data two;
input B  A ;
cards;
4   1
5   2
6   3
;

proc sql;
create table want as
select * from one(obs=0)
union corr all
select * from two;
quit;
mkeintz
PROC Star

If you know that the vars in one of the datasets constitute a proper subset of vars in the other, then you just need to drop all vars not in that "narrow" dataset.  In this case, you want to drop all vars not in datset2.  But you want to merge such that dataset2 VALUES prevail:

 

data Dataset1;
  input A  B  C;
datalines;
0   0   0
run;

data Dataset2;
  input  B  A ;
datalines;
 4   1
 5   2
 6   3
run;

data dataset3;
  if 0 then set dataset2;      /*Start the PDV with the dataset2 vars*/
  retain _sentinel_between .;  /*Append this var to the PDV */
  merge dataset1 dataset2;     /*Append any vars exclusively in dataset1 to PDV*/
  retain _sentinel_after .;    /*Append a final var to the PDV*/
  drop _sentinel_between -- _sentinel_after;
run;

 

Note the PDV (program data vector) is the set of variables that the sas compiler establishes by parsing your sas code, prior to processing any data.  At each statement above the PDV is expanded when any new variables are encountered.  So by putting in _sentinel_between and _sentinel_after variables in strategic locations in the PDV, it becomes easy to specify a list of variables to drop.

 

The double dash in drop _sentinel_between -- _sentinel_after; tells sas to include all vars between the two sentinels, including the end points.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 5682 views
  • 5 likes
  • 5 in conversation