BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
genemroz
Quartz | Level 8

Esteemed Advisers:

I have two datasets, Observations and Stations. I want to search the variable USL in Observations for every instance of a station_id from the dataset of Stations. In this simple exemplar case, that would be observations 1,4, and 5. These three observations  dataset Observations would then be written to a third dataset called Matches. How can I accomplish this?  Thanks in advance for taking a look at this!

 

Dataset Observations
Obs USL
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V

 

Dataset Stations
Obs station_id
1 US0001
2 US0002
3 US0003
4 US0004
5 US0005
6 US0006
7 US0007
8 US0008
9 US0009
10 US000A

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

 

Result:

 

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V

View solution in original post

2 REPLIES 2
PeterClemmensen
Tourmaline | Level 20

Try this

 

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

 

Result:

 

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
genemroz
Quartz | Level 8

Thanks for the prompt and helpful reply (I think...). Your solution worked on my larger datasets. But now I have to go and learn hash object code to figure out what just happened here.  But I'm truly grateful and marking this solution as accepted.

 

Regards,

 

Gene

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 500 views
  • 0 likes
  • 2 in conversation