SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Swoosh
Quartz | Level 8

I currently have upwards of 10+ datasets, each of which have a list of variables like 2 seen below:

Variable Names
ID|BusName|PIN|Diag_cd1|Diag_cd2|Diag_cd3|Diag_cd4|Diag_cd5|Diag_cd6|Diag_cd7|Diag_cd8|Spec_cd1|Spec_cd2|Spec_cd3|Spec_cd4|Address1|Address2|City|State|Zip|Zip4
Variable Names
ID|BusName|PIN|Diag_cd|Diag_cd1|Diag_cd2|Diag_cd4|Diag_cd5|Diag_cd6|Diag_cd7|Diag_cd8|Spec_cd|Spec_cd2|Spec_cd3|Spec_cd4|Address1|Address2|City|State|Zip|Zip4

and so on...

 

I was told there is a more efficient way to rename my variables than using proc format. Instead, they told me to

1) use proc contents to get a list of variable names

2) sort datasets by all variables

2) create a counter variable

3) create a rename variable using the counter

4) Use conditional logic to rename the prefix  (Assigned [e.g. Assigned_ID]) on ID, BusName, PIN, address, city, state, zip, etc

    Ex) if index(Add) then new_name=Assigned_Address1; if upcase(NAME) then new_name=cat("Assigned",name)

5)Obtain 1 column with original variable names and another column for renamed variables

6) Use proc sql to create a renaming structure to generate renamed variables

 

proc contents data=rawdata.raw_&file. out=contents_&file. (keep=NAME) noprint;
run;
proc sort data=contents_&file.;       by _all_; 
run;

I'm a little lost on part 3 and on...specifically how to assign the counter using pieces of NAME to rename.

 

End result:

Variable Name|New Variable Name
ID|Assigned_ID
Diag_cd1|Assigned_DX1
Diag_cd2|Assigned_DX2
Diag_cd3|Assigned_DX3
...
Spec_cd1|Assigned_SPCD1
Spec_cd2|Assigned_SPCD2
Spec_cd3|Assigned_SPCD3
...
Address1|Assigned_Address1
Address2|Assigned_Address2
City|Assigned_City
State|Assigned_State
Zip|Assigned_Zip
Zip4|Assigned_Zip4
Variable Name|New Variable Name
ID|Assigned_ID
Diag_cd|Assigned_DX1
Diag_cd2|Assigned_DX2
Diag_cd3|Assigned_DX3
...
Spec_cd|Assigned_SPCD1
Spec_cd1|Assigned_SPCD2
Spec_cd2|Assigned_SPCD3
...
Address1|Assigned_Address1
Address2|Assigned_Address2
City|Assigned_City
State|Assigned_State
Zip|Assigned_Zip
Zip4|Assigned_Zip4

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I suspect part of your problem with writing the code it not quite knowing how to describe the problem.

Step 1 is getting the variable names into data.  If you start with actual datasets then use PROC CONTENTS or access the SAS metadata (via DICTIONARY.COLUMNS or SASHELP.VCOLUMN is not using SQL code).  If your source is the first row of a text file then just read the first line of the file using trailing @@ on the input statement.  Whichever method you use create a dataset with one observation per variable.

 

Step 2 is locating the names that you want to rename.  To do that you need to classify each variable that ends with a numeric suffix to its corresponding basename.  One way to do that is using PRCHANGE() function.

 

basename = prxchange('s/\d*$//',-1,trim(name));

Step 3 is figuring out whether you want to change the basename.  It looks like there are three variables where you don't want to change the base name.  And for the others you want to add Prov prefix to the names.

if lowcase(basename) not in ('provID','npi','txnmy_cd') then basename='Prov'||basename;

Step 3 is add the numeric suffix to the variables in a consistent manner.  It is still not clear to me if you only want to generate the numeric suffix when there are more than one instance of the variable or not.  But since not adding the suffix is probably how you got into this mess to begin with then perhaps you should always add the suffix.   Also it is not clear if you ever receive gaps in the series (like B1,B2,B6,B7).  Again I would just eliminate the gaps and number then consistently.

if first.basename then suffixnum=1;
nename=cats(basename,suffixnum);
suffixnum+1;

Once you have all of the NAME,NEWNAME pairs then just use that information to rename the variables.  How to do that depends on what you are doing.  If you are trying to combine the datasets you could do it with RENAME=() dataset option.  If you are copying the datasets to new datasets you could do it with RENAME statement , or RENAME=() dataset option on either the input or the output dataset.  If you are trying to modify the names in an existing dataset then do it with the RENAME statement within a MODIFY group in PROC DATASETS.

 

So see if you can generate a dataset that has the old NAME and generated NEWNAME.  Once you have that the rest is easy and there are many examples on this forum of ways to do that actual renaming.

View solution in original post

12 REPLIES 12
ChrisNZ
Tourmaline | Level 20

If you are going to use SQL and recreate the tables, and if the columns are always in the right order, it much simpler to use a UNION.

proc sql;
  create table NEW as 
  select * from TABLE_WITH_GOOD_NAMES (obs=0)
  union
  select * from TABLE_WITH_BAD_NAMES;

 

ballardw
Super User

You can likely get all the information you need with a single step using either Proc SQL and querying Dictionary.columns or a data step using SASHELP.Volumn. This data view, the Vcolumn version has all of the variables and there properties in all of the data sets.

 

Something like:

data need;
   set sashelp.vcolumn;
   where libname='MAPS' and memname =: 'A';
run;

will the details of all variables of al data sets (memname) that start with "A" in the library 'MAPS'. Libname and memname are stored in upper case so select appropriately.

 

Then something along these lines.

<use what ever logic to create old name/new name pairs
from that Need data set>
<use that data set in a data _null_ step to create
 the following statments using Call Execute
>

Proc datasets library='MAPS';
   modify <dataset name goes here>;
   rename
      oldvar1=newvarname1
      oldvar2=newvarname2
      ....
   ;
   modify <next dataset name> ;
   rename
      oldvar1=newvarname1
      oldvar2=newvarname2
      ....
   ;
  <repeat until you run out data sets>
run;
quit;

Searching the forum for "Proc Datasets" rename you can find

https://communities.sas.com/t5/SAS-Procedures/Rename-all-variables/m-p/374935

https://communities.sas.com/t5/SAS-Enterprise-Guide/column-rename-macro/m-p/493320

https://communities.sas.com/t5/SAS-Programming/How-to-rename-all-the-column-names-of-a-dataset/m-p/4...

with bits and pieces similar to what you need.

 

The data _null_ using a BY MEMNAME lets you have "if first.memname then do" to write the stuff that starts with the data set name change and an "if last.memname then do" to end the rename block.

 

Shmuel
Garnet | Level 18

You have already accepted a solution for dynamically rename a list of variable names

in post:

https://communities.sas.com/t5/SAS-Programming/Dynamically-change-the-name-of-variables/m-p/663795#M... 

 

You probably can use same method, of creating an informat, for those new datasets. 

Patrick
Opal | Level 21

Building on the code shared here you basically could just add a special type of informat which does the renaming for you using Regular Expressions. 

data have1;
  array vars {*} 
    ID otherVar
    Diag_cd Diag_cd1 Diag_cd2 Diag_cd3 
    Spec_cd1 Spec_cd2 Spec_cd3 
    Address1 Address2 City State Zip Zip4;
  stop;
run;

data have2;
  array vars {*} ProvID NPI txnmy_cd;
  stop;
run;


%macro renameVars(tbl,infmt_stdz);
  %local lib;
  %let lib=%upcase(%scan(WORK.&tbl,-2,.));
  %let tbl=%upcase(%scan(&tbl,-1,.));

  %local rename_vars;
  %let rename_vars=;
  proc sql noprint;
    select cats(name,'=',input(name,$&infmt_stdz.32.)) 
      into :rename_vars separated by ' '
    from dictionary.columns
    where 
      libname="&lib" 
      and memname="&tbl"
      and not missing(input(name,$&infmt_stdz.32.))
    ;
  quit;
  
  %if %nrbquote(&rename_vars) ne %nrbquote() %then
    %do;
      proc datasets lib=&lib nolist;
        modify &tbl;
          rename &rename_vars;
        run;
      quit;
    %end;
%mend;

proc format;
  invalue $varname_stdz(default=32 upcase notsorted)
    'TXNMY_CD'  = 'Taxonomy'
    'TXNMY_CD1' = 'Taxonomy'
    'TXNMY_CD2' = 'Taxonomy2'
    other=' '
    ;  
  invalue $varname_prefix(default=32 notsorted)
    /* exclusions */
    's/ID|otherVar//i' (regexpe)      = _same_
    /* renames */
    's/^Diag_cd(.*)/Assigned_DX$1/i' (regexpe) = _same_
    's/^Spec_cd(.*)/Assigned_SPCD$1/i' (regexpe) = _same_
    's/^(.*)/Assigned_$1/i' (regexpe)    = _same_
    other=' '
    ;
run;

data _null_;
  infile datalines dsd dlm=' ' truncover;
  input tbls :$41.;
  rc=dosubl(cats('%renameVars(',tbls,',varname_stdz);'));
  rc=dosubl(cats('%renameVars(',tbls,',varname_prefix);'));
  datalines;
have1
work.have2
haveNot
;

proc contents data=have1 varnum;
run;

Patrick_0-1593476110308.png

 

A_Swoosh
Quartz | Level 8

While the solutions posted previously in my thread and suggested here by Patrick and Shmuel were great, I was advised from the person looking over my data that I should avoid manually creating a list of rename, manually creating an array listing out my variables, and/or creating a format.

 

Instead, I was advised to approach the renaming of my variables differently:

 

Create a counter for my variables where they start with either txnmy_cd/txnmy_cd1 and go on to n+1. Create a rename structure to provide a prefix to other variables (e.g. Address1 to ProvAddress1) and renaming other instances like Prov_type to Provtype. I think this can be accomplished with either a combination of proc contents to obtain a variable list and/or proc sql dictionary columns to obtain my list of variables. The issue I encounter is creating a counter then do loop after I run my proc contents or dictionary.columns approach.

 

 

Tom
Super User Tom
Super User

@A_Swoosh wrote:

While the solutions posted previously in my thread and suggested here by Patrick and Shmuel were great, I was advised from the person looking over my data that I should avoid manually creating a list of rename, manually creating an array listing out my variables, and/or creating a format.

 

Instead, I was advised to approach the renaming of my variables differently:

 

Create a counter for my variables where they start with either txnmy_cd/txnmy_cd1 and go on to n+1. Create a rename structure to provide a prefix to other variables (e.g. Address1 to ProvAddress1) and renaming other instances like Prov_type to Provtype. I think this can be accomplished with either a combination of proc contents to obtain a variable list and/or proc sql dictionary columns to obtain my list of variables. The issue I encounter is creating a counter then do loop after I run my proc contents or dictionary.columns approach.


Is there a pattern to the issues you are having with the names?  I am not sure I see a fixed pattern in your examples. 

 

For some you seem to be saying that some variables that do not end in any numeric suffix should be renamed to add 1 to the end of the name.  Do you want to do that for every variable that does not end in a numeric suffix?  Or only when there are some other existing variables that do end a numeric suffix?  If the latter are you positive that there are not "basenames" that include a numeric suffix already?  For example do you ever see patterns like: q11, q12, q13, q2, q31,q32 q33,q34 

For others you seem to be saying that you should remove underscores from the variable name.  Do you want to remove all underscores?

 

What is the source of the inconsistency in the variables names?  Is this a one time fix for a finite set of existing datasets? Or do you expect to get new datasets in the future where similar types of renamings will be required?

I suspect that any solution where the source data can change will require some type of review process to verify that your renaming rules actually work for the new datasets.

A_Swoosh
Quartz | Level 8

There is some pattern with the names.

  • Taxonomy codes and spec_cd both either start with txnmy_cd or txnmy_cd1 and spec_cd or spec_cd1 and continue for n+1. 
  • underscores should be removed for everything
  • Some variables are not changed at all (e.g. ID, PIN)
  • other variables (address) have a prefix 'Prov'

There will be future iterations of the process which may include new datasets but they will follow similar naming conventions. 

Patrick
Opal | Level 21

@A_Swoosh wrote:

There is some pattern with the names.

  • Taxonomy codes and spec_cd both either start with txnmy_cd or txnmy_cd1 and spec_cd or spec_cd1 and continue for n+1. 
  • underscores should be removed for everything
  • Some variables are not changed at all (e.g. ID, PIN)
  • other variables (address) have a prefix 'Prov'

There will be future iterations of the process which may include new datasets but they will follow similar naming conventions. 


You need to clearly define the patterns and the desired result. If you do that then we can use regular expressions to match such patterns. 

 

Taxonomy codes and spec_cd both either start with txnmy_cd or txnmy_cd1 and spec_cd or spec_cd1 and continue for n+1

So if it starts with txnmy_cd do you now want the first variable to become txnmy_cd1  then then the 2nd variable - txnmy_cd1 - to become txnmy_cd2? Please provide representative sample data (via a SAS data step) and then show us the desired result.

 

Create a rename structure to provide a prefix to other variables

You can either do this within code and if/then/else statements or you do it data driven so you don't have to change the code to add additional patterns. Using an Informat especially with the RegEx is a way to implement something data driven.

A_Swoosh
Quartz | Level 8

Have: 

 

Dataset 1:

ProvID NPI Txnmy_cd1 Txnmy_cd2 Spec_cd1 Spec_cd2 Address1 Address2 City State Zip Zip4 Prov_type Pcp_flg

Dataset 2:

ProvID NPI txnmy_cd txnmy_cd2 Spec_cd Spec_cd1 Address1 Address2 City State Zip Zip4 Prov_type Pcp_flg

and so on... (all datasets follow similar structure)

 

Want:

Dataset 1:

ProvID NPI Taxonomy Taxonomy2 ProvSpec1 ProvSpec2 ProvAddress1 ProvAddress2 ProvCity ProvState ProvZip ProvZip4 Provtype PCP_flg

Dataset 2:

ProvID NPI Taxonomy Taxonomy2 ProvSpec1 ProvSpec2 ProvAddress1 ProvAddress2 ProvCity ProvState ProvZip ProvZip4 Provtype pcp_flg

I'm attempting to make this data driven as much as possible but to avoid using informat, format, or manually renaming statements since future years may have more datasets and/or variables or the layout may change. From what I was told my the person overlooking the project is that I should consider a counter variable for the taxonomy and spec_cd, an index with a combined if/else statement for address variables, and then a rename structure (proc sql) to rename the variables.

Tom
Super User Tom
Super User

I suspect part of your problem with writing the code it not quite knowing how to describe the problem.

Step 1 is getting the variable names into data.  If you start with actual datasets then use PROC CONTENTS or access the SAS metadata (via DICTIONARY.COLUMNS or SASHELP.VCOLUMN is not using SQL code).  If your source is the first row of a text file then just read the first line of the file using trailing @@ on the input statement.  Whichever method you use create a dataset with one observation per variable.

 

Step 2 is locating the names that you want to rename.  To do that you need to classify each variable that ends with a numeric suffix to its corresponding basename.  One way to do that is using PRCHANGE() function.

 

basename = prxchange('s/\d*$//',-1,trim(name));

Step 3 is figuring out whether you want to change the basename.  It looks like there are three variables where you don't want to change the base name.  And for the others you want to add Prov prefix to the names.

if lowcase(basename) not in ('provID','npi','txnmy_cd') then basename='Prov'||basename;

Step 3 is add the numeric suffix to the variables in a consistent manner.  It is still not clear to me if you only want to generate the numeric suffix when there are more than one instance of the variable or not.  But since not adding the suffix is probably how you got into this mess to begin with then perhaps you should always add the suffix.   Also it is not clear if you ever receive gaps in the series (like B1,B2,B6,B7).  Again I would just eliminate the gaps and number then consistently.

if first.basename then suffixnum=1;
nename=cats(basename,suffixnum);
suffixnum+1;

Once you have all of the NAME,NEWNAME pairs then just use that information to rename the variables.  How to do that depends on what you are doing.  If you are trying to combine the datasets you could do it with RENAME=() dataset option.  If you are copying the datasets to new datasets you could do it with RENAME statement , or RENAME=() dataset option on either the input or the output dataset.  If you are trying to modify the names in an existing dataset then do it with the RENAME statement within a MODIFY group in PROC DATASETS.

 

So see if you can generate a dataset that has the old NAME and generated NEWNAME.  Once you have that the rest is easy and there are many examples on this forum of ways to do that actual renaming.

A_Swoosh
Quartz | Level 8

Hi Tom,

 

Thank you for your response. You are right; I am having trouble communicating the problem. 

 

Step 1 and 2 are no issues. I am able to generate a list of variable names, review the list, and identify the renaming that is required as seen below.

Dataset 1:

A_Swoosh_2-1594146550778.png

Dataset 2:

A_Swoosh_0-1594146456018.png

 

data new; 
        set contents_&file.;
         if upcase(name) in ('ADDRESS1' 'ADDRESS2' 'BUSNAME' 'CITY' 
          'COUNTY' 'FNAME' 'LNAME' 'MI' 'STATE' 'ZIP' 'ZIP4') 
         then basename='Prov'||name; else basename=name;
run;

I want to rename txnmy_cd and spec_cd using a numeric suffix since they have more than one instance of the variable beginning with:

  • Taxonomy, Taxonomy2, Taxonomy3, etc.
  • ProvSpec1, ProvSpec2, ProvSpec3, etc.

I also want to remove the underscore from prov_type also.

 

In addition, I am trying to remove those weird naming conventions in dataset 2 (e.g. CountyA_B, PCP_flagD) to follow the format of dataset 1.

  • I was considering the use of index function to do this. I'm not sure if there is a more efficient way.

Also, dataset 2 is slightly different so the code listed above won't work on that dataset. Using this function below, I don't know if this quite works for my address and zip variables.

basename = prxchange('s/\d*$//',-1,trim(name));

 I think once I am able to produce a column right next to the original name column, I can use proc sql to rename the column variables as the new column variables. 

Patrick
Opal | Level 21

You tell us you need a counter but the sample data you're posting indicates that you just want to maintain the already existing numbering on variables when renaming them.

Here your have

txnmy_cd txnmy_cd2

Here your want

Taxonomy Taxonomy2

 

If one would use a counter then why is there no Taxonomy1 in your want data? What are the rules? Please don't just repeat what you've already told us but try to reformulate and further specify what you have and what you want - and the detailed logic in words how to get from have to want.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4395 views
  • 0 likes
  • 6 in conversation