DATA Step, Macro, Functions and more

Renaming variables in SAS table

Reply
Contributor
Posts: 25

Renaming variables in SAS table

Hello all,

I have a list of variables in a SAS dataset. They all start with "Rule" but with different numbers following (e.g., rule90012, rule200044, rule0000200034). They are also not ranked in order.

Now I want to rename them one by one, to be rule1, rule2, rule3, rule4,.... Is there any way to do that?

The reason I want to do this is because I need to run each of the rules as a dependent variable through logistic regression, for like 600+ times. So I have a do loop program to automate that process

The do-loop codes:

%macro mylogit (num);

%do i=1 %to #

     title "dependent variable is rule&i";

     proc logistic data=test des;

     model rule&i=var1 var2 var3 var4/selection=forward;

     run;

%end;

%mend;

If anyone could think of any other way to do the iterative logistic regression for 600+ different dependent variables, without renaming them beforehand, it would also be great.

THANK YOU.

Super User
Posts: 17,941

Re: Renaming variables in SAS table

Flip your data using proc transpose or a data step and then use by processing instead. This way you can also capture any output stats into a table using ODS if required.

The concept is explained a bit more here:

Super User
Posts: 17,941

Re: Renaming variables in SAS table

You can also grab all the name from sashelp.vcolumn and use call execute to drive the macro.

*Get list of names into dataset;

proc sql;

create table var_names as

select name

from sashelp.vcolumn

where libname='WORK' and memname='TEST' and name like 'RULE%';

quit;

*Modified macro code to take variable name;

%macro mylogit (dependent);

     title "dependent variable is rule&i";

     proc logistic data=test des;

     model &dependent=var1 var2 var3 var4/selection=forward;

     run;

%mend;

*Use call execute to run code;

data _null_;

set var_names;

str = catt('%mylogit(', name, ')');

call execute(str);

run;

Contributor
Posts: 25

Re: Renaming variables in SAS table

Thank you Reeza and Tommywhosc! This is really helpful.

Since I'm only interested in the relationship between one IV and DV with accounting the effects from all other variables. Is there any way we can select the model results where that IV is selected?

Thanks again

Super User
Posts: 17,941

Re: Renaming variables in SAS table

If you use the transpose method you can output the parameter estimates to a single table and then filter your models based on the results.

Contributor
Posts: 20

Re: Renaming variables in SAS table

I agree with Reeza, because...

1) re-naming the variables will needlessly complicate things, and

2) the solution was pretty much exactly what *I* was going to post, but she beat me to it. Smiley Wink

One thought... 600+ regressions will create a LOT of output. If you want to limit the number of models analyzed, you can run  Reeza's  program multiple times, selecting a sub-set of DVs each time:

     ... and name like 'RULE0%';     (analysis on only those variable names starting with 'RULE0'). Then re-run with:

     ... and name like 'RULE1%';     (etc.)

               .

     ... and name like 'RULE9%';

(A similar approach: create 10 different var_names data sets (var_name0, var_name1, etc) and SET each one-at-a-time).

OR if you want more control, you can put a counter variable in the full data set, and sub-set on that:

data var_names;

set var_names;

counter+1;

run;

%let start=1;

%let stop=50;  /* analysis/models  on first 50 rows (first 50 RULEs) of var_names data set */

data _null_;

set var_names(where=(&start <= counter <= &stop));

... (remainder of program)....

Change the start, stop values and re-run the data _null_ step  until all 600+ models have been analyzed.

Good luck on your research!

Contributor
Posts: 20

Re: Renaming variables in SAS table

Well, that makes things interesting (I was wondering how you would evaluate >600 models).

OK, Reeza is correct - if you can transpose the data then you'll end up with one output data set (I think you want the OUTEST= option), that gives the estimates for those IV's selected.

If you'd rather keep your data in its original form, you could run the models as programmed earlier, getting an output data set for each model, and concatenating them one-at-a-time. Until you end up with one large data set with estimates for all 600+ models.

One approach: put PROC APPEND in the macro, while using the OUTEST= option:

%macro mylogit (dependent);

     title "dependent variable is &dependent";

     proc logistic data=test desc outest=ABC;

     model &dependent=var1 var2 var3 var4/selection=forward;

     run;

PROC APPEND BASE=work.ALL data=work.ABC  FORCE; run;

%mend;

I'm a little worried about doing it this way because PROC APPEND will drop variables if they don't appear in the BASE table, but in the DATA table. And it might be that the OUTEST data set will change from one model to another, depending on how many (if any) IVs are significant.

Give it a try, using a few runs (ie &start=1, &stop=10 and see what happens).

OR... let's just make individual OUTEST data sets, and concatenate them outside the loop.

%macro mylogit (dependent);

title "dependent variable is &dependent";

     proc logistic data=test desc outest=&dependent; *One output data set for each model, each named RULEnnnnn     ;

     model &dependent=var1 var2 var3 var4/selection=forward;

     run;

%mend;

So after the above loop runs, you'll have   &stop-&start+1   OUTEST data sets, named 'RULEnnnnn'. You don't want to do this:

data all;

set RULE000123 RULE1122 RULE.....;

run;

Instead make a macro list of the OUTEST data sets you just created:

proc sql;

select distinct name into :VARS separated by ' '

from work.var_names(where=(&start <= counter <= &stop));

quit;

%put &vars=;

/* the above code is using the COUNTER+1;  logic mentioned earlier */

data all;

set  &vars; *or set ALL  &vars ;   if already have created data set ALL ;

run;

If you decide to try it this way, I'd advise starting slowly, with say 10 models. Look over the output, check it against all the OUTEST datasets, and then data ALL. If it looks OK, try again for another 20 or so models.

NOTE: this is assuming it's the OUTEST= option you want. If not, find the one you need; the logic should hold for whatever OUT= you use.

NOTE: you'll end up with 600+ OUTEST tables, so you should probably stop every so often and delete a bunch of them.

Ask a Question
Discussion stats
  • 6 replies
  • 319 views
  • 0 likes
  • 3 in conversation