Solved: Finding matching observations?

genemroz · Posted 10-11-2022 09:09 AM

Esteemed Advisers:

I have two datasets, Observations and Stations. I want to search the variable USL in Observations for every instance of a station_id from the dataset of Stations. In this simple exemplar case, that would be observations 1,4, and 5. These three observations dataset Observations would then be written to a third dataset called Matches. How can I accomplish this? Thanks in advance for taking a look at this!

Dataset Observations
Obs USL
1 US0003,US0005,USL00V
2 USL007,USL00D
3 USL009,USL00A,USL00J,USL019
4 US0003,US0005,USL009,USL00V
5 US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V
6 USL006,USL00B,USL00D
7 USL003,USL008,USL00G,USL00M
8 USL00A,USL00J,USL012,USV002,USV003
9 USL00B,USL00E,USL00J
10 US000J,USL00U,USL00V

Dataset Stations
Obs station_id
1 US0001
2 US0002
3 US0003
4 US0004
5 US0005
6 US0006
7 US0007
8 US0008
9 US0009
10 US000A

PeterClemmensen · Posted 10-11-2022 09:19 AM

Try this

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

Result:

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 10-11-2022 09:19 AM

Try this

data Observations;
input obs USL :$100.;
datalines;
1  US0003,US0005,USL00V                                           
2  USL007,USL00D                                                  
3  USL009,USL00A,USL00J,USL019                                    
4  US0003,US0005,USL009,USL00V                                    
5  US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V 
6  USL006,USL00B,USL00D                                           
7  USL003,USL008,USL00G,USL00M                                    
8  USL00A,USL00J,USL012,USV002,USV003                             
9  USL00B,USL00E,USL00J                                           
10 US000J,USL00U,USL00V                                           
;

data Stations;
input station_id $;
datalines;
US0001
US0002
US0003
US0004
US0005
US0006
US0007
US0008
US0009
US000A
;

data Matches(drop = i station_id);
   if _N_ = 1 then do;
      dcl hash h(dataset : 'Stations');
      h.definekey('station_id');
      h.definedone();
   end;

   set Observations;

   do i = 1 to countw(USL);
      station_id = scan(USL, i, ',');
      if h.check() = 0 then do;
         output;
         leave;
      end;
   end;
run;

Result:

obs USL
1   US0003,US0005,USL00V
4   US0003,US0005,USL009,USL00V
5   US0003,US0004,US0005,US0007,US0008,US000A,US000D,US000K,USL00V

The DATA to DATA Step Macro
Blog: SASnrd

genemroz · Posted 10-11-2022 09:45 AM

Thanks for the prompt and helpful reply (I think...). Your solution worked on my larger datasets. But now I have to go and learn hash object code to figure out what just happened here. But I'm truly grateful and marking this solution as accepted.

Regards,

Gene

Finding matching observations?

Re: Finding matching observations?

Re: Finding matching observations?

Re: Finding matching observations?

Finding matching observations?

Re: Finding matching observations?

Re: Finding matching observations?

Re: Finding matching observations?

Register Today!

SAS Training: Just a Click Away