BookmarkSubscribeRSS Feed
pamplemouse22
Calcite | Level 5

Hello all, 

 

I have a question about merging two datasets of different lengths. Datasets look like this: 

data dat1; 
input x; 
datalines; 
1
1
1
1
2
2
2
2
3
3
3
3
;
run; 

data dat2;
input y; 
datalines; 
a
b
c
d
;
run; 

And I want this: 

data want; 
input x y; 
datalines; 
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 d
3 a
3 b
3 c
3 d
;
run;

Since I don't have a unique identifier, I'm not sure how to merge this and allow the variable y to repeat down. 

 

I have tried: 

data attempt; 
merge dat1 dat2; 
run; 

but, this only repeats dat2 once, throuhg the "1"s of dat1. How do I allow dat2 to repeat all the way down the values of dat1? 

 

Thank you!

6 REPLIES 6
LinusH
Tourmaline | Level 20
I'm puzzled. What is the real life scenario here?
Merging without common key is usually hazardous.
Data never sleeps
PBsas
Obsidian | Level 7

One SQL way. this does cartesian join and duplicates are removed.

 

proc sql;
create table attempt as
select distinct x,y from dat1,dat2;
quit;
mkeintz
PROC Star

The proc sql is much more compact programming, but might take longer the a data step for large files.  After all, it first does a cartesian crossing only to delete the majority of cross-records (3/4 in your case).  Of course, that could be solved by having only unique X values in DAT1.

 

But here, as requested is a data step solution, assuming that DAT1 is sorted by X

 

 

data want (drop=_:);
  set dat1 ;
  by x;

  if _n_=1 then do;
    if 0 then set dat2 (obs=0) nobs=ndat2;
    declare hash d2 (dataset:'need2', ordered:'Y');
      d2.definekey('_key');
      d2.definedata(all:'Y');
      d2.definedone();
  end;

  _key+1;
  if first.x then _key=1;
  _rc=d2.find();
run;

 

This will work as expected if DAT1 has exactly NDAT2 observation per X value.  (NDAT2 is numeber of obs in DAT2).  It won't cause any "erroneous" values if a given X value has fewer than NDAT2 records.  But if an X group has more than NDAT2, then note that the last value of Y in NDAT2 will prevail in all the "extra" dat1 records - i.e. you'll get a duplicate X/Y combination.  If you want to protect against that, then precede the "_rc=d2.find()" statement with a "call missing(y)", protecting against duplicate non-missing Y's.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

So what is your key for making the merge? Why not add one?

data dat1_fixed;
  set dat1 ;
  by x ;
  rep+1;
  if first.x then rep=1;
run;
data dat2_fixed;
  set dat2 ;
  rep+1;
run;
proc sql ;
  create table want as
    select a.*,b.y
    from dat1_fixed a
    inner join dat2_fixed b
   on a.rep = b.rep
   order by 1,2
 ;
quit;
Obs    x    rep    y

  1    1     1     a
  2    1     2     b
  3    1     3     c
  4    1     4     d
  5    2     1     a
  6    2     2     b
  7    2     3     c
  8    2     4     d
  9    3     1     a
 10    3     2     b
 11    3     3     c
 12    3     4     d
novinosrin
Tourmaline | Level 20

@pamplemouse22   Hiter the hash can help here I think. Assuming your dat1 is sorted as shown in your example-

 

data dat1;

input x;

datalines;

1

1

1

1

2

2

2

2

3

3

3

3

;

run;

 

data dat2;

input y $;

datalines;

a

b

c

d

;

run;

 

 

 

data want;

   if _N_ = 1 then do;

     if 0 then set dat2;

     declare hash h(dataset:"dat2", ordered: 'yes');

     declare hiter iter('h');

      h.defineKey('y');

      h.defineData('y');

      h.defineDone();

    end;

set dat1;

by x;

 if first.x then  iter.first();

 else iter.next();

run;

 

Regards,

Naveen Srinivasan

s_lassen
Meteorite | Level 14
data want;
  do _N_=1 by 1 until(last.x);
    set dat1;
    by x;
    if _N_<=n2 then
      set dat2 point=_N_ nobs=n2;
    else y=.;
    output;
    end;
run;

This solution puts Y values on as many X-s as there are, and if it runs out of Y values, it sets Y missing.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 923 views
  • 0 likes
  • 7 in conversation