Solved: Re: Appending the data with same observation

Smitha9 · Posted 01-19-2021 08:12 AM

Hi,

I have two datasets:

ID place ID place

1 abc 1 xxx

2 def 2 yyy

3 ghi 3 zzz

4 jkl 4 www

I want to append to same ID. the output should be:

ID place

1 abc

1 xxx

2 def

2 yyy

3 ghi

3 zzz

4 jkl

4 www

can you help me with the code? thanks you.

PeterClemmensen · Posted 01-19-2021 08:17 AM

This is called interleaving. You can read more here: Examples: Interleave Data

data one;
input ID place $;
datalines;
1 abc 
2 def 
3 ghi 
4 jkl 
;

data two;
input ID place $;
datalines;
1 xxx 
2 yyy 
3 zzz 
4 www 
;

data want;
   set one two;
   by ID;
run;

Result:

Obs ID place 
1   1  abc 
2   1  xxx 
3   2  def 
4   2  yyy 
5   3  ghi 
6   3  zzz 
7   4  jkl 
8   4  www

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

novinosrin · Posted 01-19-2021 08:14 AM

data want;
 set dataset1 dataset2;
 by id;
run;

full code-

data one;
input ID place $;
cards;
1   abc                             
2   def                               
3   ghi                             
4   jkl                                 
;

data two;
  input ID  place $;
  cards;
1      xxx
2      yyy
3      zzz
4      www
;

data want;
 set one two;
 by id;
run;

Kurt_Bremser · Posted 01-19-2021 08:15 AM

It's called "interleaving":

data want;
set
  ds1
  ds2
;
by id;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 01-19-2021 08:17 AM

This is called interleaving. You can read more here: Examples: Interleave Data

data one;
input ID place $;
datalines;
1 abc 
2 def 
3 ghi 
4 jkl 
;

data two;
input ID place $;
datalines;
1 xxx 
2 yyy 
3 zzz 
4 www 
;

data want;
   set one two;
   by ID;
run;

Result:

Obs ID place 
1   1  abc 
2   1  xxx 
3   2  def 
4   2  yyy 
5   3  ghi 
6   3  zzz 
7   4  jkl 
8   4  www

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 01-19-2021 08:33 AM

Pleasing to see multiple minds resorted to the same "Interleaving". @Quentin the genius as called by Guru Paul D once made a beautiful comparison to compare "Interleave" with "Union all" to help understand the "never guaranteed" sort order of SQL though functionally so close to interleave with an ORDER BY, albeit interleave begs the dataset to be presorted in the first place for BY group processing. Therefore, interleaving is actually sorted append.

I would like other elders like @FreelanceReinh @Tom and @mkeintz etc. to offer some insights to pick and choose interleave vs Union all in an event of pushing the query into the Database. Would Union all lessen I/O? Thank you in advance!


data one;
input ID place $;
cards;
1   abc                             
2   def                               
3   ghi                             
4   jkl                                 
;

data two;
  input ID  place $;
  cards;
1      xxx
2      yyy
3      zzz
4      www
;

proc sql;
 create table want as
 select * from one
 union all
 select * from two
 order by id;
quit;

FreelanceReinh · Posted 01-19-2021 11:15 AM

Hi @novinosrin,

In a situation where both techniques are applicable, my primary consideration would always be the result. Unlike the interleaving DATA step, UNION ALL aligns columns by position, unless you specify the CORRESPONDING option. This option, in turn, changes the way leftover columns are handled, i.e., they are dropped, whereas the DATA step would keep them. But the most subtle difference (as you mentioned) is that the sort order within the ID BY groups is unpredictable (yet unusable as a "random order") in the UNION ALL result.

In terms of performance, I would rate the DATA step between the PROC SQL steps with and without the order by id clause because the ORDER BY clause would benefit less from the sort order of the source datasets.

So I would generally prefer the DATA step over PROC SQL for the OP's task.

novinosrin · Posted 01-19-2021 11:24 AM

Thank you Sir @FreelanceReinh Makes sense. What prompted me to think through UNION ALL is from the perspective of pushing the query in to the DB. Of course, the sort order would indeed be an issue in SQL. A lot of folks here at my work place are comfortable with SQL and that happens to be their primary skill and tool of choice, and for whatever reason they seem to have mastered SAS Access to third party DB as a default option and are well versed with how mechanics of SQL engine. Whether to start the day with orange juice or coffee, the SQL guys tend to do coffee than orange though admittedly orange is healthier option. 🙂

mkeintz · Posted 01-19-2021 12:56 PM

I am constantly disappointed by the atavistic run to PROC SQL for problems that would often be far more easy to code, and far more efficient if programmed in a data step.

You know, one can take advantage of proc sql syntax to create a template data set (obs=0), and then use it to guide a data step, which might be faster, especially for ordered data.

For instance if you only want the variables common to one and two:

proc sql noprint;
  create table common_vars_template as select * from one (obs=0) union corr select * from two (obs=0);
quit;

data want;
  if 0 then set common_vars_template;
  retain _x1 .;
  set one two;
  by id;
  retain _x2 .;
  drop _x1 -- _x2;
run;

This could be done strictly in data steps, but with a lot more coding:

data two_only (drop=_:);
  set one;
  retain _before_two .;
  set two;
  retain _after_two .;
  keep _before_two -- _after_two;
  stop;
run;

data one_only (drop=_:);
  set two;
  retain _before_one .;
  set one;
  retain _after_one .;
  keep _before_one -- _after_one;
  stop;
run;

data want;
  retain _x1;
  if 0 then set one_only two_only;
  retain _x2;
  drop _x1 -- _x2;
  set one two;
  by id;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 01-19-2021 01:24 PM

Nice trick Mark. Let me share the template idea to my SQL colleagues. Also, my colleagues do not intend to cause a disappointment or anything like that. It's just their comfort zone and fear of something new and unknown. Basically, to do and work with what they know, and chill. Please do not feel disappointed. 🙂

ballardw · Posted 01-19-2021 10:23 AM

@Smitha9 wrote:

Hi,

I have two datasets:

ID place                            ID place

1   abc                               1      xxx

2   def                                2      yyy

3   ghi                                3      zzz

4   jkl                                 4      www

I want to append to same ID. the output should be:

ID place

1    abc

1    xxx

2     def

2     yyy

3     ghi

3     zzz

4     jkl

4     www

can you help me with the code? thanks you.

There are a few suggested solutions that rely on the example values given. Is it the actual case that your second data set values actually have the assumed sort order? Or do you need to force the first record of the second data set to be the second regardless of actual alphabetic values of the ID variables?

If the values do not sort and you need that behavior you may need to add some information to the data sets to force a specific order such as a data set source variable to include in the BY statement.

Registration is open