Esteemed Advisers:
I have two datasets, Observations and Stations. I want to search the variable USL in Observations for every instance of a station_id from the dataset of Stations. In this simple exemplar case, that would be observations 1,4, and 5. These three observations dataset Observations would then be written to a third dataset called Matches. How can I accomplish this? Thanks in advance for taking a look at this!
Dataset Observations
Obs USL
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
Dataset Stations
Obs station_id
1 US0001
2 US0002
3 US0003
4 US0004
5 US0005
6 US0006
7 US0007
8 US0008
9 US0009
10 US000A
Try this
data Observations;
input obs USL :$100.;
datalines;
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
;
data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;
data Matches(drop = i station_id);
if _N_ = 1 then do;
dcl hash h(dataset : 'Stations');
h.definekey('station_id');
h.definedone();
end;
set Observations;
do i = 1 to countw(USL);
station_id = scan(USL, i, ',');
if h.check() = 0 then do;
output;
leave;
end;
end;
run;
Result:
obs USL 1 US0003,US0005,USL00V 4 US0003,US0005,USL009,USL00V 5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
Try this
data Observations;
input obs USL :$100.;
datalines;
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V
;
data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;
data Matches(drop = i station_id);
if _N_ = 1 then do;
dcl hash h(dataset : 'Stations');
h.definekey('station_id');
h.definedone();
end;
set Observations;
do i = 1 to countw(USL);
station_id = scan(USL, i, ',');
if h.check() = 0 then do;
output;
leave;
end;
end;
run;
Result:
obs USL 1 US0003,US0005,USL00V 4 US0003,US0005,USL009,USL00V 5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
Thanks for the prompt and helpful reply (I think...). Your solution worked on my larger datasets. But now I have to go and learn hash object code to figure out what just happened here. But I'm truly grateful and marking this solution as accepted.
Regards,
Gene
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.