Esteemed Advisers:
I have two datasets, Observations and Stations. I want to search the variable USL in Observations for every instance of a station_id from the dataset of Stations. In this simple exemplar case, that would be observations 1,4, and 5. These three observations dataset Observations would then be written to a third dataset called Matches. How can I accomplish this? Thanks in advance for taking a look at this!
Dataset Observations
Obs USL
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
Dataset Stations
Obs station_id
1 US0001
2 US0002
3 US0003
4 US0004
5 US0005
6 US0006
7 US0007
8 US0008
9 US0009
10 US000A
Try this
data Observations;
input obs USL :$100.;
datalines;
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
;
data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;
data Matches(drop = i station_id);
if _N_ = 1 then do;
dcl hash h(dataset : 'Stations');
h.definekey('station_id');
h.definedone();
end;
set Observations;
do i = 1 to countw(USL);
station_id = scan(USL, i, ',');
if h.check() = 0 then do;
output;
leave;
end;
end;
run;
Result:
obs USL 1 US0003,US0005,USL00V 4 US0003,US0005,USL009,USL00V 5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
Try this
data Observations;
input obs USL :$100.;
datalines;
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
;
data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;
data Matches(drop = i station_id);
if _N_ = 1 then do;
dcl hash h(dataset : 'Stations');
h.definekey('station_id');
h.definedone();
end;
set Observations;
do i = 1 to countw(USL);
station_id = scan(USL, i, ',');
if h.check() = 0 then do;
output;
leave;
end;
end;
run;
Result:
obs USL 1 US0003,US0005,USL00V 4 US0003,US0005,USL009,USL00V 5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
Thanks for the prompt and helpful reply (I think...). Your solution worked on my larger datasets. But now I have to go and learn hash object code to figure out what just happened here. But I'm truly grateful and marking this solution as accepted.
Regards,
Gene
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.