DATA Step, Macro, Functions and more

Merge Data

Reply
Contributor
Posts: 39

Merge Data

[ Edited ]

Hello all, 

 

I have a question about merging two datasets of different lengths. Datasets look like this: 

data dat1; 
input x; 
datalines; 
1
1
1
1
2
2
2
2
3
3
3
3
;
run; 

data dat2;
input y; 
datalines; 
a
b
c
d
;
run; 

And I want this: 

data want; 
input x y; 
datalines; 
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
3 a
3 b
3 c
3 d
;
run;

Since I don't have a unique identifier, I'm not sure how to merge this and allow the variable y to repeat down. 

 

I have tried: 

data attempt; 
merge dat1 dat2; 
run; 

but, this only repeats dat2 once, throuhg the "1"s of dat1. How do I allow dat2 to repeat all the way down the values of dat1? 

 

Thank you!

Super User
Posts: 5,914

Re: Merge Data

Posted in reply to pamplemouse22
I'm puzzled. What is the real life scenario here?
Merging without common key is usually hazardous.
Data never sleeps
Contributor
Posts: 30

Re: Merge Data

Posted in reply to pamplemouse22

One SQL way. this does cartesian join and duplicates are removed.

 

proc sql;
create table attempt as
select distinct x,y from dat1,dat2;
quit;
Trusted Advisor
Posts: 1,387

Re: Merge Data

Posted in reply to pamplemouse22

The proc sql is much more compact programming, but might take longer the a data step for large files.  After all, it first does a cartesian crossing only to delete the majority of cross-records (3/4 in your case).  Of course, that could be solved by having only unique X values in DAT1.

 

But here, as requested is a data step solution, assuming that DAT1 is sorted by X

 

 

data want (drop=_:);
  set dat1 ;
  by x;

  if _n_=1 then do;
    if 0 then set dat2 (obs=0) nobs=ndat2;
    declare hash d2 (dataset:'need2', ordered:'Y');
      d2.definekey('_key');
      d2.definedata(all:'Y');
      d2.definedone();
  end;

  _key+1;
  if first.x then _key=1;
  _rc=d2.find();
run;

 

This will work as expected if DAT1 has exactly NDAT2 observation per X value.  (NDAT2 is numeber of obs in DAT2).  It won't cause any "erroneous" values if a given X value has fewer than NDAT2 records.  But if an X group has more than NDAT2, then note that the last value of Y in NDAT2 will prevail in all the "extra" dat1 records - i.e. you'll get a duplicate X/Y combination.  If you want to protect against that, then precede the "_rc=d2.find()" statement with a "call missing(y)", protecting against duplicate non-missing Y's.

Super User
Super User
Posts: 8,263

Re: Merge Data

Posted in reply to pamplemouse22

So what is your key for making the merge? Why not add one?

data dat1_fixed;
  set dat1 ;
  by x ;
  rep+1;
  if first.x then rep=1;
run;
data dat2_fixed;
  set dat2 ;
  rep+1;
run;
proc sql ;
  create table want as
    select a.*,b.y
    from dat1_fixed a
    inner join dat2_fixed b
   on a.rep = b.rep
   order by 1,2
 ;
quit;
Obs    x    rep    y

  1    1     1     a
  2    1     2     b
  3    1     3     c
  4    1     4     d
  5    2     1     a
  6    2     2     b
  7    2     3     c
  8    2     4     d
  9    3     1     a
 10    3     2     b
 11    3     3     c
 12    3     4     d
Super User
Posts: 2,045

Re: Merge Data

@pamplemouse22   Hiter the hash can help here I think. Assuming your dat1 is sorted as shown in your example-

 

data dat1;

input x;

datalines;

1

1

1

1

2

2

2

2

3

3

3

3

;

run;

 

data dat2;

input y $;

datalines;

a

b

c

d

;

run;

 

 

 

data want;

   if _N_ = 1 then do;

     if 0 then set dat2;

     declare hash h(dataset:"dat2", ordered: 'yes');

     declare hiter iter('h');

      h.defineKey('y');

      h.defineData('y');

      h.defineDone();

    end;

set dat1;

by x;

 if first.x then  iter.first();

 else iter.next();

run;

 

Regards,

Naveen Srinivasan

PROC Star
Posts: 269

Re: Merge Data

Posted in reply to pamplemouse22
data want;
  do _N_=1 by 1 until(last.x);
    set dat1;
    by x;
    if _N_<=n2 then
      set dat2 point=_N_ nobs=n2;
    else y=.;
    output;
    end;
run;

This solution puts Y values on as many X-s as there are, and if it runs out of Y values, it sets Y missing.

Ask a Question
Discussion stats
  • 6 replies
  • 176 views
  • 0 likes
  • 7 in conversation