DATA Step, Macro, Functions and more

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Reply
New Contributor
Posts: 4

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.


Very often it is useful to have a list of all the variables you have in a certain table. You don’t always want to type them out yourself (hard coding) but instead want an automatically generated list based on what is present in the table.


Think about a regression. You can either type all regressor variables yourself or you load a list into a macro variable and then use that macro variable in the regression equation.

Here is an example of how to make this list by DATA step (though PROC SQL is often easier and preferred over the DATA STEP when it comes to loading values into macro variables).

DATA;

          SET SASHELP.VCOLUMN

(WHERE=(LIBNAME = "libraryoftable" AND MEMNAME = "tablename"));

          CALL SYMPUT(COMPRESS('allvariables'||_N_),name);

          CALL SYMPUT('nbr_of_variables'),_N_);

RUN;

SASHELP.VCOLUMN is a metadata table. It contains data about data. In this case it contains the column-names for every table you have. In order to get the variables you want you will need to select the correct table and this is done by putting a restriction on both  the library and table name (often just table name is enough but if you are in a habit of reusing the same table names over different libraries, you are best to select on both). In the example this is done with the WHERE clause. The MEMNAME (member name) is a variable in the VCOLUMN metadata table which contains all the table names.

Here we upload the contends of the name variable to macro variables by using CALL SYMPUT. A combination of _N_ and a chosen macrovariable name (here 'allvariables') will make the datastep construct our macro variable array.

We also keep the number of variables in a macro variable called 'nbr_of_variables' so we can know at what number the array ends.

In order to check whether your macro variables are created correctly you can use %PUT. This will make SAS put the contents of the variables in the log.

%PUT &allvariables1 &allvariables2 &allvariables3;

%PUT &nbr_of_variables;

Calling the first variable of our array is done by resolving &allvariables1. Calling the last one however is a bit more tricky since it requires us to know how many of these macro variables were actually created. Luckily we kept that information in the macro variable &nbr_of_variables.

In order to get the last macro variable of our array we simply combine the two macro names as so (in this article we do not explain how macro variables resolve, you can find more on this in our library or by following the MACRO courses).

%PUT &&allvariables&nbr_of_variables;

Now you are ready to resolve these macro variables anywhere by building a simple macro loop;

%MACRO printmacroarray;

          %DO i = 1%TO &nbr_of_variables;

                   %PUT &&allvariables&i;

          %END;

%MEND printmacroarray;

%printmacroarray;

Contributor
Posts: 36

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Posted in reply to BIGuidance

I believe sashelp.vcolumn is a runtime view so using proc contents is a much lighter approach, especially when you have a large database with hundreds or even thousands of tables.

15 DATA _NULL_;

16

17 SET SASHELP.VCOLUMN

18

19 (WHERE=(LIBNAME = "DDS" AND MEMNAME = "FINANCIAL_ACCOUNT"));

20

21 CALL SYMPUT(COMPRESS('allvariables'||_N_),name);

22

23 CALL SYMPUT('nbr_of_variables',_N_);

24

25 RUN;

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).

21:48 23:42

NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.

NOTE: There were 129 observations read from the data set SASHELP.VCOLUMN.

WHERE (LIBNAME='DDS') and (MEMNAME='FINANCIAL_ACCOUNT');

NOTE: DATA statement used (Total process time):

real time 45.30 seconds

cpu time 5.48 seconds

26

27 PROC CONTENTS DATA=DDS.FINANCIAL_ACCOUNT OUT=CONTENTS (KEEP=NAME);

28 RUN;

NOTE: The data set WORK.CONTENTS has 129 observations and 1 variables.

NOTE: Compressing data set WORK.CONTENTS increased size by 100.00 percent.

Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.30 seconds

cpu time 0.14 seconds

29

30 DATA _NULL_;

31 SET CONTENTS;

3 The SAS System 09:52 Tuesday, January 31, 2012

32 CALL SYMPUT(COMPRESS('allvariables'||_N_),name);

33 CALL SYMPUT('nbr_of_variables',_N_);

34

35 RUN;

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).

32:39 33:34

NOTE: There were 129 observations read from the data set WORK.CONTENTS.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.02 seconds

New Contributor
Posts: 4

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Nice alternative indeed Smiley Happy

Valued Guide
Posts: 2,177

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

PAzevedo

what follows is a demo, showing SASHELP.VCOLUMN being read quickly (even on this 4-year-old laptop)

What helps speed up the pass, is the process = PROC SQL.

SASHELP.VCOLUMN is an SQL VIEW which SQL understands, a whole lot more than a DATA STEP does.

With this understanding, SQL can pass through the WHERE clause to the underlying DICTIONARY.COLUMNS which is "an object that DATA STEP does not understand".

Whatever it is that dynamically fills DICTIONARY.COLUMNS it is able to use the WHERE filter passed by PROC SQL to reduce the metadata being examined to just the column information required. So it has the possibility of being even faster than PROC CONTENTS (which also loads table metadata information).

It is not SASHELP.VCOLUMN which is "slow" but rather, the DATA STEP which has to load all metadata, of all columns, in all tables, in all libraries to pass the "table" to the WHERE condition.

Here is the SASlog demonstrating the speed that PROC SQL reads SASHELP.VCOLUMN for the table MAPS.ASIA

The comparison was run in EG4.2

DATA STEP

2.03secs for data step doing everything

or PROC SQL and DATA STEP

.02 secs in PROC SQL (extract the metadata) followed by

.00 secs in the DATA STEP (reading metadata extracted by PROC SQL)

19 %let lib= MAPS;

20 %let mem= ASIA;

21

22 proc sql;

23 create table cols_&mem as select * from sashelp.vcolumn

24 where LIBNAME = "&LIB" AND MEMNAME = "&MEM"

25 ;

NOTE: Table WORK.COLS_ASIA created, with 8 rows and 18 columns.

26 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

27

28 proc contents data= &lib..&mem noprint out= conts_&mem;

29 run;

NOTE: The data set WORK.CONTS_ASIA has 8 observations and 40 variables.

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

30

31 DATA _null_ ;

32 SET SASHELP.VCOLUMN

33 (WHERE=(LIBNAME = "&LIB" AND MEMNAME = "&MEM"));

34 CALL SYMPUTX(cats('allvariables',_N_),name);

35 CALL SYMPUTX( 'nbr_of_variables',_N_);

36 RUN;

NOTE: There were 8 observations read from the data set SASHELP.VCOLUMN.

WHERE (LIBNAME='MAPS') and (MEMNAME='ASIA');

NOTE: DATA statement used (Total process time):

real time 2.03 seconds

cpu time 0.67 seconds

37

38 data _null_ ;

39 SET cols_&mem;

40 CALL SYMPUTX(cats('allvariable2',_N_),name);

41 CALL SYMPUTX( 'nbr_of_variable2',_N_);

42 RUN;

NOTE: There were 8 observations read from the data set WORK.COLS_ASIA.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

43

44 data _null_ ;

45 SET conts_&mem;

46 CALL SYMPUTX(cats('allvariable3',_N_),name);

47 CALL SYMPUTX( 'nbr_of_variable3',_N_);

48 RUN;

NOTE: There were 8 observations read from the data set WORK.CONTS_ASIA.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

The log also provides the "real" runtimes for using PROC CONTENTS and DATA STEP

.03 secs in PROC SQL (extract the metadata) followed by

.00 secs in the DATA STEP (reading metadata extracted by PROC CONTENTS)

The more libraries that are defined, or more tables that are present within those libraries, the more a DATA STEP has to read from SASHELP.VCOLUMN.

My setup has little more than base SAS and SAS/GRAPH. Your's might have more to read.

peterC

Contributor
Posts: 36

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Always learning.

Thks Peter.C.

PA.

Trusted Advisor
Posts: 1,301

Re: Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Here is a timed example using a library connected to an oracle database containing hundreds of tables:

15         libname ort oracle user=user pass=pass path=path;

NOTE: Libref ORT was successfully assigned as follows:

      Engine:        ORACLE

      Physical Name: path

16        

17         proc sql noprint;

18          select name into :allvariables1 separated by ' '

19            from sashelp.vcolumn

20           where libname = 'ORT'

21                 and memname= 'TABLE';

22          %let nbr_of_variables1=&sqlobs;

23         quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           2.92 seconds

      cpu time            0.01 seconds

     

24        

25         proc sql noprint;

26          select name into :allvariables2 separated by ' '

27            from dictionary.columns

28           where libname = 'ORT'

29                 and memname= 'TABLE';

30          %let nbr_of_variables2=&sqlobs;

31         quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           2.80 seconds

      cpu time            0.00 seconds

     

32        

33         proc contents data=ort.table noprint out=contents; run;

NOTE: The data set WORK.CONTENTS has 24 observations and 40 variables.

NOTE: PROCEDURE CONTENTS used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

34        

35         data _null_;

36          length slurp $32000;

37          do until(done);

38           set contents end=done;

39           slurp=catx(' ',of slurp name);

40           i+1;

41          end;

42          call symputx('allvariables3',slurp);

43          call symputx('nbr_of_variables3',i);

44         run;

NOTE: There were 24 observations read from the data set WORK.CONTENTS.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

An additional example showing that sashelp.vcolumn can be accessed quickly…  dictionary.columns is typically faster since it is avoiding the extra view facility, however the difference is often negligible.

17         proc sql noprint;

18          select name into :allvariables1 separated by ' '

19            from sashelp.vcolumn

20           where libname = 'MAPS'

21                 and memname= 'ASIA';

22          %let nbr_of_variables1=&sqlobs;

23         quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.04 seconds

      cpu time            0.01 seconds

     

24        

25         proc sql noprint;

26          select name into :allvariables2 separated by ' '

27            from dictionary.columns

28           where libname = 'MAPS'

29                 and memname= 'ASIA';

30          %let nbr_of_variables2=&sqlobs;

31         quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

32        

33         proc contents data=maps.asia noprint out=contents; run;

NOTE: The data set WORK.CONTENTS has 8 observations and 40 variables.

NOTE: PROCEDURE CONTENTS used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

34        

35         data _null_;

36          length slurp $32000;

37          do until(done);

38           set contents end=done;

39           slurp=catx(' ',of slurp name);

40           i+1;

41          end;

42          call symputx('allvariables3',slurp);

43          call symputx('nbr_of_variables3',i);

44         run;

NOTE: There were 8 observations read from the data set WORK.CONTENTS.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.01 seconds

Super Contributor
Posts: 1,636

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Posted in reply to BIGuidance

Hi,

I think the libname and table name in your code have to be capital letters.

DATA;

          SET SASHELP.VCOLUMN

(WHERE=(LIBNAME = "LIBRARYOFTABLE" AND MEMNAME = "TABLENAME"));

          CALL SYMPUTx(cats('allvariables',_N_),name);

          CALL SYMPUTx('nbr_of_variables',_N_);

RUN;

Respected Advisor
Posts: 3,156

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Posted in reply to BIGuidance

I apologize if this is a bit off-topic. There are numerous ways from table variable names to macro variable values, such as dictionary.column (SQL), sashelp.vcolumn (datastep), proc content, even array-vname(). However, are there any  approaches (preferablely non-macro)  to do the reverse conversion? eg. converting macro variable values back to table variable names?

Thanks,

Haikuo

PROC Star
Posts: 7,468

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Hai.kuo,

Not sure what you are asking.  If you have a list of space-separated variables names, and they are all either numeric or have already had informats assigned, you can always use the macro variable for text substitution.  e.g.,

data want;

   input &yourMacroVariable.;

   etc.

run;

Respected Advisor
Posts: 3,156

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Thanks, Art.

Your answer makes me realize that I have asked a wrong question. Rather, it is more of macro question. To simply my scenario, here is an example:

%let m=test1 test2;

%let n=_2012;

data have;

%scan(&m,1)=1

%scan(&m,2)=2;

%scan(&m,1)||&n=999;

run;

I have trouble to make a concatenated variable out of macro variable values.

Thanks  in advance,

Haikuo

Super User
Super User
Posts: 7,039

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

You do not need the concatenation operator.

Just code %scan(&m,1)&n .

127  %let m=test1 test2;

128  %let n=_2012;

129

130

131  data have;

132  %scan(&m,1)=1;

133  %scan(&m,2)=2;

134  %scan(&m,1)&n=999;

135   put (_all_) (=);

136  run;

test1=1 test2=2 test1_2012=999

Respected Advisor
Posts: 3,156

Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

aha! Many Many Thanks, Tom!!! I knew something wrong with my way of thinking upon macro.

Super User
Super User
Posts: 7,039

Re: Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

Code generation.  I perfer to use data step with PUT statement to generate code as it is much easier to debug.

For example you could generate ATTRIB statements.

filename code temp;

data _null_;

  file code ;

  set contents ;

  put 'attrib ' name 'length=' @;

  if type=2 then put '$'@;

  put length @;

  if label ne ' ' then put "label='" label +(-1) "'" @;

  put ';';

run;

data new ;

  %inc code;

run;

Respected Advisor
Posts: 3,799

Re: Reduce hard coding by using the VCOLUMN table in SASHELP to get a list of your variables into a macro variable array.

I'm glad you said that Tom as I feel the same, especially if the bulk of the data is already in a SAS data set.  And you can make it look nice with indentation and the rest and even more importantly SAS can correctly associate errors with line numbers unlike with MPRINT.  %INC was all we had back in the day, way back.  Macro language + %INC "cat’s meow"

Many times as in this example LABEL and LENGTH can be printed with LIST put making the program a bit more concise.

17427  filename code temp;

17428  data _null_;

17429     *file code;

17430     set contents;

17431     put +3 'Attrib ' name @;

17432     select(type);

17433        when(1) put length= @;

17434        when(2) put length=dollar8. @;

17435        end;

17436     if not missing(label) then put label=:quote258. @;

17437     put ';';

17438     run;

   Attrib AgeAtDeath LENGTH=8 LABEL="Age at Death" ;

   Attrib AgeAtStart LENGTH=8 LABEL="Age at Start" ;

   Attrib AgeCHDdiag LENGTH=8 LABEL="Age CHD Diagnosed" ;

   Attrib BP_Status LENGTH=$7 LABEL="Blood Pressure Status" ;

   Attrib Chol_Status LENGTH=$10 LABEL="Cholesterol Status" ;

   Attrib Cholesterol LENGTH=8 ;

   Attrib DeathCause LENGTH=$26 LABEL="Cause of Death" ;

   Attrib Diastolic LENGTH=8 ;

   Attrib Height LENGTH=8 ;

   Attrib MRW LENGTH=8 LABEL="Metropolitan Relative Weight" ;

   Attrib Sex LENGTH=$6 ;

   Attrib Smoking LENGTH=8 ;

   Attrib Smoking_Status LENGTH=$17 LABEL="Smoking Status" ;

   Attrib Status LENGTH=$5 ;

   Attrib Systolic LENGTH=8 ;

   Attrib Weight LENGTH=8 ;

   Attrib Weight_Status LENGTH=$11 LABEL="Weight Status" ;

NOTE: There were 17 observations read from the data set WORK.CONTENTS.

Ask a Question
Discussion stats
  • 13 replies
  • 1244 views
  • 1 like
  • 9 in conversation