Desktop productivity for business analysts and programmers

dynamically rename your variables to correspond to the first row of the data

Reply
New Contributor
Posts: 2

dynamically rename your variables to correspond to the first row of the data

Hi,

 

 

This is in regards to 7612302545 where you ask the following:

 

Here is the SAS code I used to rename the variable fields in the tbl_Measure_Crosswalk dataset that did not import them.

libname rates '/sasdata/hcqnetp/users/ijones01/projects/hedis/benchmarks';

data rates.tbl_measure_crosswalk;
   rename F1=Measure_ID
F2=Measure_key
F3=Measure_Name
F4=MeasureNameAbbreviation
F5=IDSSElementID
F6=Product
F7=LB_Report;
   set rates.tbl_measure_crosswalk;
    if _N_=1 then delete;
run;

However, I would like to hard coding variable names. What if there are 100 variables and not 7? Since the variable names are in the data. Is there a way to read the values into a macro variables or arrays and then do the rename using loops.  

 

For some reason the code that was sent to me did not take the variable names from the second observations row and used it to create field names. Do you know what I did wrong?

libname rates '/sasdata/hcqnetp/users/ijones01/projects/hedis/benchmarks';

*Import csv file into a sas dataset*;
PROC IMPORT DATAFILE="/sasdata/hcqnetp/users/ijones01/projects/hedis/benchmarks/tbl_measure_crosswalk.csv"
OUT=rates.tbl_measure_crosswalk
DBMS=csv
REPLACE;
GETNAMES=No;
RUN;


/* create a macro variable of the variable names from the first row. This first set of code Concatenate the variable names in the second row of observations */
data rates.tbl_measure_crosswalk;
set rates.tbl_measure_crosswalk;
if _n_=1;
all_vars=catx(' ',of fSmiley Happy;
drop f:;
call symputx('varnames',all_vars);
run;
%put &varnames;
%let count=%sysfunc(countw(&varnames,%str( )));
%put &count;

/* loop through the variable list using %SCAN */
%macro rename;

data rates.tbl_measure_crosswalk;
rename %do i=1 %to &count;
F&i=%scan(&varnames,&i,%str( ))
%end;;
set rates.tbl_measure_crosswalk;
if _N_=1 then delete;
run;
%mend;
options mprint mlogic symbolgen;
%rename
/* confirm that names were changed */
proc contents data=rates.tbl_measure_crosswalk;
run;

Super User
Posts: 12,996

Re: dynamically rename your variables to correspond to the first row of the data

Posted in reply to Jesusismygrace

A better approach if you want variables to be of the same type and such so that you can combine the data is to Stop after the first proc import a capture the data step code generated, either copy in to the editor or perhaps Recall the generated code.

 

Make sure that the code is doing what you want as far as variable names, types, informats and formats.

The you can run the code with a different intput file on the infile statement and a different output data set. The variable names witll be the same, the lengths and types of variables will be compatible.

 

AS for why it didn't get the variable names:

PROC IMPORT DATAFILE="/sasdata/hcqnetp/users/ijones01/projects/hedis/benchmarks/tbl_measure_crosswalk.csv"
OUT=rates.tbl_measure_crosswalk
DBMS=csv
REPLACE;
GETNAMES=No;  <= guess what this line does
RUN;

 

You will save a lot of time by reading correctly an not relying on proc import. Your rename approach will with probability approaching unity eventually produce datasets with different types of data for some variables and lengths of character variables.

Since the default number of lines to guess the variable type and length is 20 lines you may already have truncated character variables.

The option GUESSINGROWS can be used to tell SAS to look at more lines of data to set characteristics. I generally use the number of records or similar expected in the data set, or use something like 32000.

New Contributor
Posts: 2

Re: dynamically rename your variables to correspond to the first row of the data

Great info, thank you

Super User
Super User
Posts: 7,844

Re: dynamically rename your variables to correspond to the first row of the data

Posted in reply to Jesusismygrace

Let me make a guess at to what your actual issue is an then propose a couple of solutions.

 

Problem: How to use PROC IMPORT to import a delimited text file that has one or more "title" lines before the data.

For example you might have a data file named SAMPLE that looks like this:

Listing of SASHELP.CLASS
Name,Sex,Age,Height,Weight
Alfred,M,14,69,112.5
Alice,F,13,56.5,84
Barbara,F,13,65.3,98
Carol,F,14,62.8,102.5
Henry,M,14,63.5,102.5

If you use the GETNAMES=YES option on PROC IMPORT it will try to use the first line in the file as the variable names.  Even if you set DATAROW=3.

proc import datafile=sample dbms=csv
  out=sample replace
;
  datarow=3;
  getnames=yes;
run;
        Listing_
       of_SASHELP_
Obs       CLASS       VAR2            VAR3            VAR4            VAR5

 1       Alfred        M                14              69           112.5
 2       Alice         F                13            56.5              84
 3       Barbara       F                13            65.3              98
 4       Carol         F                14            62.8           102.5
 5       Henry         M                14            63.5           102.5

So now your question becomes

How can automatically read the variable names from a delimited file when they are not on the first line?

 

 

Perhaps the easiest solution is to just get rid of the unwanted title lines and then treat new text file as a normal delimited file.

filename fixed temp;
data _null_;
  infile sample firstobs=2 ;
  file fixed ;
  input;
  put _infile_;
run;

So now you can just use PROC IMPORT on FIXED instead of SAMPLE.

proc import datafile=sample dbms=csv
  out=fixed replace
;
run;

It looks like you have attempted another solution. 

1) Read the data without names

2) Read the names

3) Use the names to generate RENAME statement.

Here is how you could do that.  Let's assume that the data starts on row 3 and that the names are on row 2.

So first read the data and your will get variables names VAR1, VAR2, .... 

proc import datafile=sample dbms=csv
 out=values replace
;
 datarow=3 ;
 getnames=no;
run;

Then read the names.  You could use the OBS= system option to prevent it from reading the whole file into the NAMES dataset. Or you could later just use the first observation.

options obs=2;
proc import datafile=sample dbms=csv
 out=names replace
;
 datarow=2;
 getnames=no;
run;
options obs=max;

Now you should have another dataset with variables named VAR1,VAR2, etc, but now the value of the variables in the names from the second row of the source file.

To convert this to a RENAME statement it will help to transpose it.

proc transpose data=names(obs=1) out=names2 ;
  var _all_;
run;

Now you will have a dataset with two column, _NAME_ and COL1.  So _NAME_ will have 'VAR1','VAR2', etc and COL1 will ahve the values from the header row.  Here is a method to use that to write a RENAME statement into a temporary file.

filename code temp;
data _null_;
  file code ;
  if _n_=1 then put 'rename';
  if eof then put ';';
  set names2 ;
  put _name_ '=' col1 ;
run;

If you header row has non-valid SAS names then you might need to manipulate the value of COL1 before writing it.  If you have VALIDVARNAME=ANY then you could perhaps use the NLITERAL() function to convert the values in name literals. 

 

Now you can apply the RENAME statement to you dataset.  You could just copy the data again.  (Or you could use PROC DATASETS to modify the dataset you already have.

data want ;
  set values ;
  %include code ;
run;
Ask a Question
Discussion stats
  • 3 replies
  • 262 views
  • 1 like
  • 3 in conversation