BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Smitha9
Fluorite | Level 6

Hi,

I have two datasets:

ID place                            ID  place

1   abc                               1      xxx

2   def                                2      yyy

3   ghi                                3      zzz

4   jkl                                 4      www

 

I want to append to same ID. the output should be:

ID place

1    abc

1    xxx

2     def

2     yyy

3     ghi

3     zzz

4     jkl

4     www

 

can you help me with the code? thanks you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

This is called interleaving. You can read more here: Examples: Interleave Data

 

data one;
input ID place $;
datalines;
1 abc 
2 def 
3 ghi 
4 jkl 
;

data two;
input ID place $;
datalines;
1 xxx 
2 yyy 
3 zzz 
4 www 
;

data want;
   set one two;
   by ID;
run;

  

 

Result:

 

Obs ID place 
1   1  abc 
2   1  xxx 
3   2  def 
4   2  yyy 
5   3  ghi 
6   3  zzz 
7   4  jkl 
8   4  www 

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20
data want;
 set dataset1 dataset2;
 by id;
run;

full code-

data one;
input ID place $;
cards;
1   abc                             
2   def                               
3   ghi                             
4   jkl                                 
;

data two;
  input ID  place $;
  cards;
1      xxx
2      yyy
3      zzz
4      www
;

data want;
 set one two;
 by id;
run;

 

PeterClemmensen
Tourmaline | Level 20

This is called interleaving. You can read more here: Examples: Interleave Data

 

data one;
input ID place $;
datalines;
1 abc 
2 def 
3 ghi 
4 jkl 
;

data two;
input ID place $;
datalines;
1 xxx 
2 yyy 
3 zzz 
4 www 
;

data want;
   set one two;
   by ID;
run;

  

 

Result:

 

Obs ID place 
1   1  abc 
2   1  xxx 
3   2  def 
4   2  yyy 
5   3  ghi 
6   3  zzz 
7   4  jkl 
8   4  www 
novinosrin
Tourmaline | Level 20

Pleasing to see multiple minds resorted to the same "Interleaving". @Quentin  the genius as called by Guru Paul D  once made a beautiful comparison to compare "Interleave" with "Union all" to help understand the "never guaranteed" sort order of SQL though functionally so close to interleave with an ORDER BY, albeit interleave begs the dataset to be presorted in the first place for BY group processing. Therefore, interleaving is actually sorted append. 

 

I would like other elders like @FreelanceReinh  @Tom and @mkeintz  etc. to offer some insights to pick and choose interleave vs Union all in an event of pushing the query into the Database. Would Union all lessen I/O? Thank you in advance!


data one;
input ID place $;
cards;
1   abc                             
2   def                               
3   ghi                             
4   jkl                                 
;

data two;
  input ID  place $;
  cards;
1      xxx
2      yyy
3      zzz
4      www
;

proc sql;
 create table want as
 select * from one
 union all
 select * from two
 order by id;
quit;
FreelanceReinh
Jade | Level 19

Hi @novinosrin,

 

In a situation where both techniques are applicable, my primary consideration would always be the result. Unlike the interleaving DATA step, UNION ALL aligns columns by position, unless you specify the CORRESPONDING option. This option, in turn, changes the way leftover columns are handled, i.e., they are dropped, whereas the DATA step would keep them. But the most subtle difference (as you mentioned) is that the sort order within the ID BY groups is unpredictable (yet unusable as a "random order") in the UNION ALL result.

 

In terms of performance, I would rate the DATA step between the PROC SQL steps with and without the order by id clause because the ORDER BY clause would benefit less from the sort order of the source datasets.

 

So I would generally prefer the DATA step over PROC SQL for the OP's task.

novinosrin
Tourmaline | Level 20

Thank you Sir @FreelanceReinh  Makes sense. What prompted me to think through UNION ALL is from the perspective of pushing the query in to the DB. Of course, the sort order would indeed be an issue in SQL. A lot of folks here at my work place are comfortable with SQL and that happens to be their primary skill and tool of choice, and for whatever reason they seem to have mastered SAS Access to third party DB as a default option and are well versed with how mechanics of SQL engine. Whether to start the day with orange juice or coffee, the SQL guys tend to do coffee than orange though admittedly orange is healthier option. 🙂

mkeintz
PROC Star

I am constantly disappointed by the atavistic run to PROC SQL for problems that would often be far more easy to code, and far more efficient if programmed in a data step.

 

You know, one can take advantage of proc sql syntax to create a template data set (obs=0), and then use it to guide a data step, which might be faster, especially for ordered data.

 

For instance if you only want the variables common to one and two:

 

proc sql noprint;
  create table common_vars_template as select * from one (obs=0) union corr select * from two (obs=0);
quit;

data want;
  if 0 then set common_vars_template;
  retain _x1 .;
  set one two;
  by id;
  retain _x2 .;
  drop _x1 -- _x2;
run;

This could be done strictly in data steps, but with a lot more coding:

data two_only (drop=_:);
  set one;
  retain _before_two .;
  set two;
  retain _after_two .;
  keep _before_two -- _after_two;
  stop;
run;

data one_only (drop=_:);
  set two;
  retain _before_one .;
  set one;
  retain _after_one .;
  keep _before_one -- _after_one;
  stop;
run;

data want;
  retain _x1;
  if 0 then set one_only two_only;
  retain _x2;
  drop _x1 -- _x2;
  set one two;
  by id;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Nice trick Mark. Let me share the template idea to my SQL colleagues. Also, my colleagues do not intend to cause a disappointment or anything like that. It's just their comfort zone and fear of something new and unknown. Basically, to do and work with what they know, and chill. Please do not feel disappointed. 🙂

ballardw
Super User

@Smitha9 wrote:

Hi,

I have two datasets:

ID place                            ID  place

1   abc                               1      xxx

2   def                                2      yyy

3   ghi                                3      zzz

4   jkl                                 4      www

 

I want to append to same ID. the output should be:

ID place

1    abc

1    xxx

2     def

2     yyy

3     ghi

3     zzz

4     jkl

4     www

 

can you help me with the code? thanks you.

 


There are a few suggested solutions that rely on the example values given. Is it the actual case that your second data set values actually have the assumed sort order? Or do you need to force the first record of the second data set to be the second regardless of actual alphabetic values of the ID variables?

If the values do not sort and you need that behavior you may need to add some information to the data sets to force a specific order such as a data set source variable to include in the BY statement.

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1709 views
  • 0 likes
  • 7 in conversation