Hello experts,
So I have a dataset X that I want to subset based on the key values stored in dataset Y:
proc sql;
create table test as
select x.*
from x, y
where x.key1 = y.key1 and
x.key2 = y.key2;
quit;
However, when the proc sql completes running, I noticed that TEST dataset had almost twice as many rows as X. So I am wondering why this is happening and what alternatives are there?
To give you an example of what I am trying to achieve: say X has 'apple' and 'banana' under key1, but Y only has 'banana' under key1, how do I make sure the subset of X only has banana for key1?
Thank you!
You must make sure that there are no duplicates of key1-key2 pairs in table y. Can be done with:
proc sql;
create table test as
select x.*
from x, (select distinct key1, key2 from y) as yy
where x.key1 = yy.key1 and x.key2 = yy.key2;
quit;
You must make sure that there are no duplicates of key1-key2 pairs in table y. Can be done with:
proc sql;
create table test as
select x.*
from x, (select distinct key1, key2 from y) as yy
where x.key1 = yy.key1 and x.key2 = yy.key2;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.