- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 06-25-2020 07:24 PM
(1155 views)
I want to select all the information from data one based on the data two id.
data one;
input id n;
datalines;
101 a
101 b
101 c
101 d
102 a
102 c
103 f
104 f
105 f
105 u
;
run;
data two;
input id;
datalines;
101
104
105
;
run;
want:
101 a
101 b
101 c
104 f
105 f
105 u
Thanks!
4 REPLIES 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What have you tried already?
Either use a SQL inner join or a data step merge with the IN keyword.
merge A (in=ina) B;
by variable;
if ina;
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And is the
101 d
missing from your Want data set a typo? If not a type then you will need to describe a rule for identifying that the record should be excluded.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data one;
input id n $;
datalines;
101 a
101 b
101 c
101 d
102 a
102 c
103 f
104 f
105 f
105 u
;
run;
data two;
input id;
datalines;
101
104
105
;
run;
proc sql;
create table want as
select *
from one
where id in (select id from two);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have two good solutions suggested: the datastep by @Patrick, and the SQL by @novinosrin. What you should chose depends:
- If your data TWO table only contains unique values of ID, the datastep is the fastest. This solution also assumes that both tables are sorted.
- If there are multiple rows with the same ID in TWO, the SQL solution immediately gives the right solution, and may be simpler to implement. And the data does not need to be sorted. If data ONE is very large and not sorted, the SQL solution may actually be faster than the datastep (which also needs time for sorting the large table).