01-14-2017 06:53 PM
CREATE TABLE LIBXYZ.DEVELOPMENT_SAMPLE_03
T1.FLAG AS BIN_TARGET , /* BINARY TARGET */
T1.DELPHI_ADDRESS_LINK_NDLNK01_ADDR AS VAR_0001 ,
T1.DELPHI_APACCS_CCDATASUPPLIED_APA AS VAR_0002 ,
T1.DELPHI_CAIS_E1B03_WORST_STATUS_L AS VAR_0003 ,
T1.DELPHI_CAIS_E1B05_WORST_STATUS_L AS VAR_0004 ,
T1.DELPHI_CAIS_E1B07_WORST_STATUS_L AS VAR_0005 ,
T1.SPB113 AS VAR_0267
FROM LIBXYZ.DEVELOPMENT_SAMPLE_02 T1
ORDER BY T1.CONTRACTNO;
The renaming technique in the above SQL code is very manual and time-consuming as well as being "hard-coded" and therefore difficult to make consistent changes
I have a data set containing several hundred variables with very long names ( lots with names greater than 32 characters and in some cases the first 32 characters are identical across distinct variables )
I would like to rename all these variables : Var_Long_Name_01 to Var_0001 as indicated in the SQL Code above.
Is there some simple way in which I can do this without having to type in the new name for each of the several hundred variables?
The solution can involve base SAS, SQL or even Excel !!
Thanks for your help
01-14-2017 07:43 PM
How are you deciding order on the variables?
You can fudge this, since it's likely a one time solution, use either proc contents OR dictionary.columns table to get all column names. Paste into Excel, in a single column. Use the autofill to generate the names in the column beside it. Then use concatenate function in Excel to generate the string:
CONCATENATE(A2, " AS ", B2, ",")
Then, copy and paste that into your code editor.
Because you have several hundred variables you can also use a code solution but it's a touch more tedious since you'll run out of space for a single macro variable so you'll have to create multiple macro variables. If there's ANY chance of needing to repeat this, I'd probably code a solution. I've used the above method in a pinch, for quick ad hocs.
01-14-2017 08:05 PM
You also most likely will have to use SQL Pass Thru at some point, because if the table has more than 32 chars you won't be able to access it in SAS. Here's an idea of how that process will work, if it was all in SAS...not sure how to deal with vars longer than 32 chars.
data class; set sashelp.class; run; /*Create macro variables*/ data _null_; set sashelp.vcolumn (keep=name varnum libname memname where=(%upcase(libname)="WORK" and %upcase(memname)="CLASS")) end=eof; string=catx (" as ", name, catt("VAR_", put(varnum, z5.))); if not eof then call symputx(catt("VAR_", put(varnum, 8. -l)), trim(string)||",", 'g'); else do; call symputx(catt("VAR_", put(varnum, 8. -l)), string, 'g'); call symputx('nvars', _n_, 'g'); end; run; /*Macro to display macro variables*/ %macro rename(); %do i=1 %to &nvars; &&&var_&i %end; %mend; *Rename process; proc sql ; create table want as select %rename() from class; quit;
01-15-2017 01:05 AM
You may combine @Reeza's code with next code:
select 1 as var01,
2 as var02,
3 as var03
and if possible, if you use excel as mid file that contains variable sequence and variable names
then I suggest to put the original name as a label to the sas variable name.
01-15-2017 06:50 AM
Thanks for your input
The context or potential application scenarios are as follows:
1. Building an Credit Risk Management Application Scorecard Model that is based on both Internal and External Predictor Variables.
2. The External Variables are a mix of Numeric and Character Variables from a Credit Risk Agency ( Names > 32 characters )
3. The data includes variables with:
all missing values,
constant values across all rows,
large numbers of missing values etc
4. High Dimensionality of the data space is a significant challenge:
Some variables are binary while others are continuous with extended ranges
The main issue during Exploratory Data Analysis and Dimension Reduction involves separating Numeric Variables from Character Variables and then applying the appropriate techniques to each subset.
Because the model often involves binary logistic regression, we can use Weight Of Evidence and Information Value, to transform all the numeric variables onto a common scale and to then rank them in terms of predictive power...
The second application will involve monthly scorreacrd monitoring reports using a relatively fixed set of scorecard variables.
As a once-off solution I think that the Excel Solution is probably the most efficient ( it is the one that we currently use ) but for ongoing work it would be good to automate the process within SAS as much as possible.
We are working with the SQL team to construct views of the data tables that we can use in SAS to overcome the 32 character name issue