BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
genemroz
Quartz | Level 8

Esteemed Advisers:

I have two datasets, Observations and Stations. I want to search the variable USL in Observations for every instance of a station_id from the dataset of Stations. In this simple exemplar case, that would be observations 1,4, and 5. These three observations  dataset Observations would then be written to a third dataset called Matches. How can I accomplish this?  Thanks in advance for taking a look at this!

 

Dataset Observations
Obs USL
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V

 

Dataset Stations
Obs station_id
1 US0001
2 US0002
3 US0003
4 US0004
5 US0005
6 US0006
7 US0007
8 US0008
9 US0009
10 US000A

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

 

Result:

 

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V

View solution in original post

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

Try this

 

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

 

Result:

 

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
genemroz
Quartz | Level 8

Thanks for the prompt and helpful reply (I think...). Your solution worked on my larger datasets. But now I have to go and learn hash object code to figure out what just happened here.  But I'm truly grateful and marking this solution as accepted.

 

Regards,

 

Gene

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 894 views
  • 0 likes
  • 2 in conversation