DATA Step, Macro, Functions and more

Using Proc SQL to rename variables in a large data set containing many variables

Reply
Contributor
Posts: 59

Using Proc SQL to rename variables in a large data set containing many variables

PROC SQL;

     CREATE TABLE LIBXYZ.DEVELOPMENT_SAMPLE_03

     AS

     SELECT

     T1.CONTRACTNO ,

     T1.FLAG                                                                    AS BIN_TARGET , /* BINARY TARGET */

      T1.DELPHI_ADDRESS_LINK_NDLNK01_ADDR      AS VAR_0001 ,

     T1.DELPHI_APACCS_CCDATASUPPLIED_APA        AS VAR_0002 ,

     T1.DELPHI_CAIS_E1B03_WORST_STATUS_L        AS VAR_0003 ,

     T1.DELPHI_CAIS_E1B05_WORST_STATUS_L        AS VAR_0004 ,

     T1.DELPHI_CAIS_E1B07_WORST_STATUS_L        AS VAR_0005 ,

 

 

     T1.SPB113                                                                 AS VAR_0267 

 

FROM             LIBXYZ.DEVELOPMENT_SAMPLE_02   T1

 

 ORDER BY    T1.CONTRACTNO;

 

QUIT;

 

The renaming technique in the above SQL code is very manual and time-consuming as well as being "hard-coded" and therefore difficult to make consistent changes

 

I have a data set containing several hundred variables with very long names ( lots with names greater than 32 characters and in some cases the first 32 characters are identical across distinct variables )

 

I would like to rename all these variables : Var_Long_Name_01  to Var_0001 as indicated in the SQL Code above.

 

Is there some simple way in which I can do this without having to type in the new name for each of the several hundred variables?

 

The solution can involve base SAS, SQL or even Excel !!

 

Thanks for your help

Super User
Posts: 19,869

Re: Using Proc SQL to rename variables in a large data set containing many variables

Posted in reply to JonDickens1607

How are you deciding order on the variables?

 

You can fudge this, since it's likely a one time solution, use either proc contents OR dictionary.columns table to get all column names. Paste into Excel, in a single column. Use the autofill to generate the names in the column beside it. Then use concatenate function in Excel to generate the string:  

 

CONCATENATE(A2, " AS ", B2, ",")

 

Then, copy and paste that into your code editor. 

 

Because you have several hundred variables you can also use a code solution but it's a touch more tedious since you'll run out of space for a single macro variable so you'll have to create multiple macro variables.  If there's ANY chance of needing to repeat this, I'd probably code a solution.  I've used the above method in a pinch, for quick ad hocs.

 

Super User
Posts: 19,869

Re: Using Proc SQL to rename variables in a large data set containing many variables

You also most likely will have to use SQL  Pass Thru at some point, because if the table has more than 32 chars you won't be able to access it in SAS.  Here's an idea of how that process will work, if it was all in SAS...not sure how to deal with vars longer than 32 chars. 

 

data class;
    set sashelp.class;
run;

/*Create macro variables*/
data _null_;
    set sashelp.vcolumn (keep=name varnum libname memname 
        where=(%upcase(libname)="WORK" and %upcase(memname)="CLASS")) end=eof;
    string=catx (" as ", name, catt("VAR_", put(varnum, z5.)));

    if not eof then
        call symputx(catt("VAR_", put(varnum, 8. -l)), trim(string)||",", 'g');
    else
        do;
            call symputx(catt("VAR_", put(varnum, 8. -l)), string, 'g');
            call symputx('nvars', _n_, 'g');
        end;
run;

/*Macro to display macro variables*/
%macro rename();
    %do i=1 %to &nvars;
        &&&var_&i
%end;
%mend;

*Rename process;

proc sql ;
    create table want as select %rename() from class;
quit;
Trusted Advisor
Posts: 1,584

Re: Using Proc SQL to rename variables in a large data set containing many variables

You may combine @Reeza's code with next code:

 

proc sql;

    select  1 as var01,

               2 as var02,

               3 as var03

    from sashelp.class;

quit;

 

and if possible, if you use excel as mid file that contains variable sequence and variable names

then I suggest to put the original name as a label to the sas variable name.

Contributor
Posts: 59

Re: Using Proc SQL to rename variables in a large data set containing many variables

Thanks

Contributor
Posts: 59

Re: Using Proc SQL to rename variables in a large data set containing many variables

Thanks for your kind assistance

Contributor
Posts: 59

Re: Using Proc SQL to rename variables in a large data set containing many variables

Hi

 

Thanks for your input

 

The context or  potential application scenarios are as follows:

 

1. Building an Credit Risk Management Application Scorecard Model that is based on both Internal and External Predictor Variables.

2. The External Variables are a mix of Numeric and Character Variables from a Credit Risk Agency ( Names > 32 characters )

3. The data includes variables with:

        all missing values,  

        constant values across all rows,

        large numbers of missing values etc

4. High Dimensionality of the data space is a significant challenge:

    Some variables are binary while others are continuous with extended ranges

 

The main issue during Exploratory Data Analysis and Dimension Reduction involves separating Numeric Variables from Character Variables and then applying the appropriate techniques to each subset.

 

Because the model often involves binary logistic regression, we can use Weight Of Evidence and Information Value, to transform all the numeric variables onto a common scale and to then rank them in terms of predictive power...

 

The second application will involve monthly scorreacrd monitoring reports using a relatively fixed set of scorecard variables.

 

As a once-off solution I think that the Excel Solution is probably the most efficient ( it is the one that we currently use ) but for ongoing work it would be good to automate the process within SAS as much as possible.

 

We are working with the SQL team to construct views of the data tables that we can use in SAS to overcome the 32 character name issue 

Ask a Question
Discussion stats
  • 6 replies
  • 1636 views
  • 1 like
  • 3 in conversation