DATA Step, Macro, Functions and more

tough one: merge 2 datasets, need to reduce 5 variables to one

Reply
Contributor
Posts: 58

tough one: merge 2 datasets, need to reduce 5 variables to one

Hello,
I'm trying to do a many-to-many match merge, but i have a problem: rather than having 1 "by" variable, i have 5 "by" variables. I don't think i can do this in SAS, but it would also work to have the 5 variables compound to one variable showing each of the permutations of the 5, then doing the merge by the new single variable.

Can anyone show me how to do this?

Thanks! Message was edited by: CharlesR
Valued Guide
Posts: 2,175

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

you can do many to many merging and many to many joining in SAS with multiple by-variables or join-variables

Of course (although you don't need it), you can also collapse 5 columns into one quite simply, like:
wide = catx( '/', col1,col3,colA,colC,colB);
first making sure you have enough room for all data in column WIDE.
When you look up the doc on function catx(), you will find it converts numerics to strings, removes all leading/trailing spaces, and separates the data columns with the first parameter value ( useful when you find you need to re-separate the data).
Contributor
Posts: 58

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

That's exactly what i did with concatenation. You can also create the variable like this:

[pre] Scenarios = zone||BallInPlayType||BattedBallVelocity||DblPlayPosn||BatSide||PlayRes_forPerc;[/pre]

Thanks a bunch for the help! Message was edited by: CharlesR
Respected Advisor
Posts: 3,892

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

Hi Charles

You can of course concatenate keys and then use this concatenated string - but if you ever want a SQL query to be able to use indexes then you don't concatenate.

HTH
Patrick
Valued Guide
Posts: 2,175

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

> That's exactly what i did with concatenation. You
> can also create the variable like this:
>
> [pre] Scenarios =
> zone||BallInPlayType||BattedBallVelocity||DblPlayPosn|
> |BatSide||PlayRes_forPerc;[/pre]
>
> Thanks a bunch for the help!
>
> Message was edited by: CharlesR

Charles
does your form of concatenation (||) remove trailing space on strings and leading blanks on automatically converted numerics?
If not you may run out of room in a default new-column width.
On the up-side of this risk, at least it will be really easy to parse, as each component will always appear in the same position on the long "SCENARIOS" string.
As always, consider the options Patrick draws to your attention - losing the benefits of indexes is sometimes a very high price to pay.

peterC
Contributor
Posts: 58

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

Indeces aren't quite as important as the final set post-proc freq has only 63,000 observations.

And yes, there are spaces which are left over, but the variables are fairly small, so this isn't a huge prob, but you're right, i should concatenate diff.

What's more important is the ability to use the concatenated variable in a merge with another data set. It's significantly faster to run a merge with a a single variable as the merge by variable, rather than using the group of variables.
Valued Guide
Posts: 2,175

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

> set. It's significantly faster to run a merge with a
> a single variable as the merge by variable, rather
> than using the group of variables.

Charles

have you any examples of the effect that reducing your 5 columns to one might have on the merge?

peterC
Respected Advisor
Posts: 3,892

Re: tough one: merge 2 datasets, need to reduce 5 variables to one

Hi Charles

A bit a more specific question - eventually with sample data and expected result - would help us to understand your question better and give you a more adequate answer.

There is quite a bit of docu available in regards of combining SAS datasets and Proc SQL. Below a few links and some example code:

Combining SAS datasets:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001125856.htm
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001293108.htm#

Proc SQL:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000086336.htm
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473691.htm (SQL joining tables)


data have1;
do key1=1,3,5;
do key2=2,4;
var='Have 1';
output;
output;
end;
end;
run;

data have2;
do key1=1,5;
do key2=2;
var='Have 2';
output;
output;
end;
end;
run;

title1 'Inner Join 1: Many to Many';
proc sql;
select
l.key1 as l_key1
, l.key2 as l_key2
, l.var as l_var
, r.key1 as r_key1
, r.key2 as r_key2
, r.var as r_var
from have1 l,have2 r
where l.key1=r.key1 and l.key2=r.key2
;
quit;
run;

title1 'Inner Join 2: Many to Many';
proc sql;
select
l.key1 as l_key1
, l.key2 as l_key2
, l.var as l_var
, r.key1 as r_key1
, r.key2 as r_key2
, r.var as r_var
from have1 l inner join have2 r
on l.key1=r.key1 and l.key2=r.key2
;
quit;
run;

title1 'Left Join: Many to Many';
proc sql;
select
l.key1 as l_key1
, l.key2 as l_key2
, l.var as l_var
, r.key1 as r_key1
, r.key2 as r_key2
, r.var as r_var
from have1 l left join have2 r
on l.key1=r.key1 and l.key2=r.key2
;
quit;
run;

title1 'Right Join: Many to Many';
proc sql;
select
l.key1 as l_key1
, l.key2 as l_key2
, l.var as l_var
, r.key1 as r_key1
, r.key2 as r_key2
, r.var as r_var
from have1 l right join have2 r
on l.key1=r.key1 and l.key2=r.key2
;
quit;
run;

title1 'Outer Join: Many to Many';
proc sql;
select
l.key1 as l_key1
, l.key2 as l_key2
, l.var as l_var
, r.key1 as r_key1
, r.key2 as r_key2
, r.var as r_var
from have1 l full outer join have2 r
on l.key1=r.key1 and l.key2=r.key2
;
quit;
run;

title1 'Union Join: Many to Many';
proc sql;
select *
from have1
union
select *
from have2
;
quit;
run;

title1 'Outer Union Join: Many to Many';
proc sql;
select *
from have1
outer union
select *
from have2
;
quit;
run;

title1 'Outer Union Corr Join: Many to Many';
proc sql;
select *
from have1
outer union Corr
select *
from have2
;
quit;
run;


HTH
Patrick Message was edited by: Patrick
Ask a Question
Discussion stats
  • 7 replies
  • 238 views
  • 0 likes
  • 3 in conversation