Re: Dynamically Subsetting an Existing Table

theponcer · Posted 10-13-2020 12:42 PM

I am trying to create a subset of a table that I have already created. I create the table as one row for each farmer, and 3 columns for each vegetable selected. However, the number of vegetables selected changes with each run, shown in the vegetables and vegetables_expanded tables below. I want to be able to dynamically subset either table (or a larger table) to only get the final want table. I want the final want table to be able to populate based on any vegetable. (i.e., keep farmer info and carrot fields, OR keep farmer info and rutabaga fields). However, I do NOT want to explicitly specify which farmer fields (Farmer, Address, State) to keep.

data vegetables;
   infile datalines delimiter=','; 
   length Farmer $15 Address $15 carrot_location $20 beet_location $20;
   input Farmer $ State $ Address $ carrot_number carrot_location $ carrot_price beet_number beet_location $ beet_price;
   datalines;                      
Old McDonald, MC, Farm 1, 12, Field 1, 3.21, 14, Field 3, .12
Dwight Schrute, PA,  Rural Rt 6, 546, Slaughterhouse Field, 1.06, 12956, Winery Field, .27  
Luke Skywalker, TA, Chottt el Jerid 1138, 57, Salt Flat 2, .54, 23, Salt Flat 3, .23
;

data vegetables_expanded;
   infile datalines delimiter=','; 
   length Farmer $15 Address $15 carrot_location $20 beet_location $20 rutabega_location $11;
   input Farmer $ State $ Address $ carrot_number carrot_location $ carrot_price beet_number beet_location $ beet_price rutabega_number rutabega_location $ rutabega_price;
   datalines;                      
Old McDonald, MC, Farm 1, 12, Field 1, 3.21, 14, Field 3, .12, 245, Field 2, 4.31
Dwight Schrute, PA,  Rural Rt 6, 546, Slaughterhouse Field, 1.06, 12956, Winery Field, .27, 0, N/A, 0  
Luke Skywalker, TA, Chottt el Jerid 1138, 57, Salt Flat 2, .54, 23, Salt Flat 3, .23, 43, Salt Flat 2, .12
;

data want1;
   infile datalines delimiter=','; 
   length Farmer $15 Address $15 carrot_location $20;
   input Farmer $ State $ Address $ carrot_number carrot_location $ carrot_price;
   datalines;                      
Old McDonald, MC, Farm 1, 12, Field 1, 3.21
Dwight Schrute, PA,  Rural Rt 6, 546, Slaughterhouse Field, 1.06  
Luke Skywalker, TA, Chottt el Jerid 1138, 57, Salt Flat 2, .54
;

I have been trying to find a solution using the sashelp vcolumn table. However, my solution requires me to know what I do NOT want to include. I'm having a hard time coming up with a dynamic solution because the number of filters I need changes depending on the number of vegetables I include in the original table. Does anyone have any suggestions on how to code this dynamically? I have a piece of code that utilizes the like functionality to exclude columns that I do not want. This works statically, but I can't quite get it to work dynamically. Here is my attempt:

*Disclaimer - I stole the dynamic code from a previous solution from @Kurt_Bremser, which can be found here

*Works - static;
data columns;
set sashelp.vcolumn;
where memname = UPCASE("vegetables_expanded") and name not like upcase("beet%") and name not like upcase("rutabega%")
and name not like lowcase("beet%") and name not like lowcase("rutabega%") and name not like lowcase("%beet") and name not like lowcase("%rutabega") 
and name not like upcase("%beet") and name not like upcase("%rutabega");
run;

*Doesn't work - dynamic;
data _null_;
call execute('data');
do until (eof1);
  set veggies (where=(veggie = "carrot")) end=eof1;
  call execute(" columns_" !! veggie);
end;
call execute('; set sashelp.vcolumn;');
do until (eof2);
  set veggies (where=(veggie ne "carrot")) end=eof2;
   call execute('where memname = UPCASE("vegetables_expanded") and name not like upcase("' !! strip(veggie) !!'%") and name not like lowcase("' !! strip(veggie) !! '%")
    and name not like upcase("%' !! strip(veggie) !! '") and name not like lowcase("%' !! strip(veggie) !! '");');
end;
call execute('run;');
stop;
run;

ballardw · Posted 10-13-2020 12:54 PM

You certainly did NOT describe what the actual rules are.

I see that you have

input Farmer $ State $ Address $ carrot_number carrot_location $ carrot_price beet_number beet_location $ beet_price;

Which indicates a data structure that is much harder to deal with "dynamically".

You would be better off with something:

input Farmer $ State $ Address $ Vegetable _number _location $ _price ;

With one record per vegetable.

Then you could likely filter (select dynamically) the value(s) of Vegetable that you need.

Currently you are actually storing data in the variable names: carrot beet etc. Which always makes things harder.

theponcer · Posted 10-13-2020 01:03 PM

Yes, that is essentially what I am trying to get to. However, right now I have a structure that contains information in the column names. I'm trying to figure out how to correct the problem.

jimbarbour · Posted 10-13-2020 01:24 PM

The first thing I might try, if the tables aren't enormous, is to copy them and at a minimum add a column that contains the vegetable type (carrot etc.). I think that's the critical piece of data, yes? It would be nice if you could genericize the column names. Are the column names always in the same order? If you could get everything into descriptive but generic column names and add a column that identifies the vegetable, I would think that would make your life a lot easier. (Did you do something to piss the DBA off? I wouldn't wish those column names on anyone).

Jim

ballardw · Posted 10-13-2020 02:05 PM

@theponcer wrote:

Yes, that is essentially what I am trying to get to. However, right now I have a structure that contains information in the column names. I'm trying to figure out how to correct the problem.

Since the data steps you have were created by you then the "problem" is in that code.

If this is coming from reading external files, then show examples of the external files and not data steps.

You will find that naming like variables in SAS it is often much easier to process when you place the changing part as a suffix instead of prefix.

Note the changes to reading the data:

data vegetables_expanded;
   infile datalines delimiter=','; 
   length Farmer $15 Address $15 location_carrot $20 location_beet $20 location_rutabega $20;
   input Farmer $ State $ Address $ number_carrot location_carrot $ price_carrot 
                                   number_beet location_beet $ price_beet 
                                   number_rutabega location_rutabega $ price_rutabega;
   datalines;                      
Old McDonald, MC, Farm 1, 12, Field 1, 3.21, 14, Field 3, .12, 245, Field 2, 4.31
Dwight Schrute, PA,  Rural Rt 6, 546, Slaughterhouse Field, 1.06, 12956, Winery Field, .27, 0, N/A, 0  
Luke Skywalker, TA, Chottt el Jerid 1138, 57, Salt Flat 2, .54, 23, Salt Flat 3, .23, 43, Salt Flat 2, .12
;

data nicer;
   set vegetables_expanded;
   array l location_: ;
   array n number_:  ;
   array p price_: ;
   length veggie $10;
   do i=1 to dim(l);
      veggie=upcase(scan(vname(l[i]),2,'_'));
      location = l[i];
      number   = n[i];
      price    = p[i];
      output;
   end;
   keep farmer address state veggie location number price;
run;

Nicer naming allows use of variable lists by topic for processing with arrays.

The VNAME function returns the name of a variable, so if the name is constructed nicely you can extract values.

Another approach could be done if ALL the values were read as character and parse both parts of the name of the variables to get veggie and location/ price/ number/ other value. An additional bit would be needed to create the price and number versions as numeric.

Dynamically Subsetting an Existing Table