DATA Step, Macro, Functions and more

The reverse-vname function - how do I get the value of a variable via its name?

Accepted Solution Solved
Reply
Super Contributor
Posts: 251
Accepted Solution

The reverse-vname function - how do I get the value of a variable via its name?

I'm trying, relatively successfully, to troll through a very large dataset (with 900-1,000 variables) finding variables which are never populated: every observation is missing.

 

At an observation level this is easy:

array char_var _char_;

do over char_var;

     if missing(char_var) then...

     end;

 

But I want to do it as efficiently as possible over the whole dataset. So I've populated a hash with the variable names via vname(char_var), and that works well. Putting the variable's name into the hash is fine. And removing it when I find a non-missing value is fine too. But I'd like to extract the variable's name from the hash and check whether that variable's value is missing - similar to &&varname in macro code. Iterating over all the elements in the array is time-consuming, when I could just do it over the remnants of the hash.

 

Is there some obvious solution that I'm missing?

 

Laurie


Accepted Solutions
Solution
‎05-17-2016 02:44 AM
PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

[ Edited ]

I would look for the maximums.  If the maximum is missing then all values are missing.

Maybe something like

data _null_;
  set SASHELP.HEART(obs=1);
  length VARNAME $32;
  call execute('proc sql; create table MAX as select ');
  do I=1 to 1e9 ;
     call vnext(VARNAME);
     if VARNAME eq 'VARNAME' then leave;
     if I > 1 then call execute(',');
     call execute(cats('max(',VARNAME,') as '||VARNAME));
   end;
   call execute(' from SASHELP.HEART;quit;');
run;

 

Once the heavyweight-lifting of scanning the table is done in one efficient pass, you only have this one-row table to parse.

View solution in original post


All Replies
Super User
Posts: 9,671

Re: The reverse-vname function - how do I get the value of a variable via its name?

Check function  VVALUEX() 

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

Brilliant! I was getting close with vvalue, but the extra letter on the function made all the difference.

 

Thank you.

 

Laurie

Solution
‎05-17-2016 02:44 AM
PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

[ Edited ]

I would look for the maximums.  If the maximum is missing then all values are missing.

Maybe something like

data _null_;
  set SASHELP.HEART(obs=1);
  length VARNAME $32;
  call execute('proc sql; create table MAX as select ');
  do I=1 to 1e9 ;
     call vnext(VARNAME);
     if VARNAME eq 'VARNAME' then leave;
     if I > 1 then call execute(',');
     call execute(cats('max(',VARNAME,') as '||VARNAME));
   end;
   call execute(' from SASHELP.HEART;quit;');
run;

 

Once the heavyweight-lifting of scanning the table is done in one efficient pass, you only have this one-row table to parse.

Super User
Posts: 17,750

Re: The reverse-vname function - how do I get the value of a variable via its name?

NLEVELS option in proc freq? 

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

966 variables, observation length of over 7k, 20.3 million rows - that's never going to fly. But thanks anyway.

 

It's best to treat it one row at a time. I think reducing the number of variables the require checking (down to around 400-odd, I think) each time is the most efficient way. In a couple of hash tables using up a maximum of 30k each, it keeps IO down to a minimum.

 

Multi-threaded DS2 would be even better probably, if I had the time to become proficient in it.

 

data &_output;
set &_input(&_INPUT1_options) end=eof;
array char_var _char_;
array num_var _numeric_;
if _n_ = 1 then do;
   length varname varname_remove $ 32;
   dcl hash variables(ordered: 'n');
   variables.defineKey('varname');
   variables.defineDone();
   dcl hiter variables_iter('variables');
   dcl hash remove_variables(ordered: 'n');
   remove_variables.defineKey('varname_remove');
   remove_variables.defineDone();
   dcl hiter remove_iter('remove_variables');
   do over char_var;
      varname = vname(char_var);
      variables.add();
      end;
   do over num_var;
      varname = vname(num_var);
      variables.add();
      end;
   end;
rc = variables_iter.first();
rc = remove_variables.clear();
do while(rc = 0);
   if strip(vvaluex(varname)) notin(' ', '.') then do;
      varname_remove = varname;
      remove_variables.add();
   end;
   rc = variables_iter.next();
   end;
if remove_variables.num_items then do;
   rc = remove_iter.first();
   do while(rc = 0);
      varname = varname_remove;
      variables.check();
      variables.remove();
      rc = remove_iter.next();
      end;
   end;
variable_count = variables.num_items;
if variable_count ne lag(variable_count) then
   output;
if eof;
variable_count = variables.num_items;
put variable_count 'variables left in hash.';
rc = variables_iter.first();
do while(not rc);
   put varname @;
   rc = variables_iter.next();
   end;
run;

PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

Never gonna fly? Why?

Done and dusted in 5 min on my 4-core windows box.

 

data HAVE(compress=no);
 array A [999];
 do i=1 to 20e6;
   output; 
 end;
run;

data _null_;
  set HAVE(obs=1);
  length VARNAME $32;
  call execute('proc sql; create table MAX as select ');
  do I=1 to 1e9 until(VARNAME='VARNAME');
     call vnext(VARNAME);
     if VARNAME eq 'VARNAME' then leave;
     if I > 1 then call execute(',');
     call execute(cats('max(',VARNAME,') as '||VARNAME));
   end;
   call execute(' from HAVE;quit;');
run;

NOTE: PROCEDURE SQL used (Total process time):

real time 5:02.24

user cpu time 3:13.84

system cpu time 1:46.11

 

memory 411383.68k

OS Memory 445396.00k

Timestamp 17/05/2016 04:51:30 PM

Step Count 69 Switch Count 4642

 

 

 

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

No, I meant using NLEVELS in proc freq.

PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

Fair enough.
Single-pass bulk processing is normally the preferred option for speed. How does the proc sql compare to the data step with your data?

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

I was quite pleased with my solution, but it was too heavy on resources. I stopped it after 30 minutes, and it still had a long way to go. Yours took 15 minutes. Nicely done.

 

I put this at the end of it to produce the analysis:


data &_output(compress=no);
set &_output;
array char_var _char_;
array num_var _numeric_;
length varname $ 32;
do over char_var;
   if missing(char_var) then do;
      varname = vname(char_var);
   output;
   end;
   end;
do over num_var;
   if missing(num_var) then do;
      varname = vname(num_var);
   output;
   end;
   end;
run;

PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

[ Edited ]

Glad it worked.


Your code, as sophisticated and clever as it is, spends its time fetching and looking up and testing and processing each individual value. Not too good when you have 20 billion of them....

Plus the SQL is easier to maintain. Complex logic is required sometimes, but not here. Smiley Happy

 

More performance tips in

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

To be fair, as it gets to the end my code gets faster and faster, checking only 50-odd variables.

But yes, you're right. Yours using the max function in sql without a 'group by' precludes sorting of all the variables, as is thus very efficient. I like it a lot.

L
PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

[ Edited ]

Another way to do the post processing:

 

data &_output(keep=VARNAME compress=no);
  set &_output;
  length VARNAME TYPE $32;
   do i=1 to 1e6 until(VARNAME='VARNAME');
     call vnext(VARNAME,TYPE);       
     if vvaluex(VARNAME)=' ' | (TYPE='N' & left(vvaluex(VARNAME))=:'.') then output;
   end;
run;      

 

PROC Star
Posts: 1,558

Re: The reverse-vname function - how do I get the value of a variable via its name?

Also note that function vvaluex() uses formatted values. So to test for actual missing values, we may need to strip the formats if you have any.

 

proc datasets lib=WORK nolist; modify HAVE; format _ALL_; run;

Super Contributor
Posts: 251

Re: The reverse-vname function - how do I get the value of a variable via its name?

Quite so. However this is a fact table in a data warehouse, and thus all values (theoretically at least) are actual unformatted values.

 

From the "more than one way of skinning a cat" department: another way of generating the SQL:

proc sql noprint;
select cat('max(', strip(name), ') as ', strip(name))
  into :varstring separated by ', '
  from dictionary.columns
 where libname = "%upcase(%scan(&_input, 1, .))"
   and memname = "%upcase(%scan(&_input, 2, .))";
create table &_output as
   select &varstring
     from &_input(&_input_options);
quit;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 497 views
  • 5 likes
  • 4 in conversation