BookmarkSubscribeRSS Feed
kat100606
Fluorite | Level 6

Hi SAS Community,

I can't figure out a way to dynamically rename columns of datasets because I ran into a couple of snags. I have 121 SAS tables that I need to combine into one, but each dataset may have columns out of order, incorrect naming conventions (LAST_NAME and Last Name), or data lines start on line 4.

 

I included photos and attached an example of this in a text file, if that would be more helpful. I'm using SAS Studio for this

HaveHave

WantWant

proc sql;
   create table have1
       (FIRST_NAME char(7),
        LAST_NAME char(7),
        Member_MBI char(7),
        Diagnosis char(7)
       );

insert into have1
    values('Kat', 'Brown', '1A1', 'Fair')
    values('Mike', 'Salder', 'ME3', 'Good')
    values('Roger', 'Blue', 'ME1', 'Bad');

title 'Have 1';

select *
   from have1;

proc printto; run;


proc sql;
   create table have2
       ('LAST NAME'n char(7),
       'First Name'n char(7),
       Diagnosis char(7)        
       );

insert into have2
    values('Grove', 'Cathy', 'Fair')
    values('Pot', 'Raine', 'Bad')
    values('Kelly', 'Cheyene', 'Good');

title 'Have 2';

select *
   from have2;

proc printto; run;



proc sql;
   create table have3
       (A char(27),
        B char(27),
        C char(27),
        D char(20)        
       );

insert into have3
    values('', '', '', 'Report Period: 09')
    values('', '', '', '')
    values('Member Last Name', 'Member First name', 'Diagnosis', 'Member MBI')
    values('Crab', 'Fish', 'Fair', '7C1')
    values('Blob', 'Cramer', 'Bad', 'PE3')
    values('Mark', 'Pencil', 'Good', 'GR9');

title 'Have 3';

select *
   from have3;

proc printto; run;

proc sql;
   create table want
       (LAST_NAME char(7),
       FIRST_NAME char(7),
       DIAGNOSIS char(7),
       MEMBER_MBI char(7)        
       );

insert into want
	values('Brown', 'Kat', 'Fair', '1A1')
    values('Mike', 'Salder', 'Good', 'ME3')
    values('Roger', 'Blue', 'Bad', 'ME1')
    values('Grove', 'Cathy', 'Fair', '')
    values('Pot', 'Raine', 'Bad', '')
    values('Kelly', 'Cheyene', 'Good', '')
    values('Crab', 'Fish', 'Fair', '7C1')
    values('Blob', 'Cramer', 'Bad', 'PE3')
    values('Mark', 'pencil', 'Good', 'GR9');

title 'Want';

select *
   from want;

 

7 REPLIES 7
Tom
Super User Tom
Super User

I am not sure I understand what you want exactly.

Are you asking for help figuring out what the RIGHT name to use is?  If so you need to supply the rules you want to use.

Or are you asking how to RENAME a variable once you have figured out what name to give it?

If so I would probably recommend using dataset options. Define the variables first. This will also set the order.

data want;
  length firstname lastname $30 mbi $5 diagnosis $100 ;
  set have1(rename=(first_name=firstname last_name=lastname member_mbi=mbi))
      have2(rename=('last name'n=lastname 'first name'n=firstname))
  ...
  ;
run;
kat100606
Fluorite | Level 6

I want to create rules to help figuring out what the right column name to use is. Because I have 121 SAS tables that I need to combine and each of them have around 30 columns each.

Shmuel
Garnet | Level 18

Are the tables to combine, all in one folder? are the other tables in same folder?

Can you select all the tables using a form (or prefix or suffix) ?

 

You can select table names and all their variable names by sql dictionary.columsn, ordered by column name. Export the list to excel. Check the names, decide what is the prefered name for each column and write as a new excel column, then import the updated excel table and create the rename program(s) for each table.

kat100606
Fluorite | Level 6

All the tables are in the same folder and I can select them using a prefix. This could work except most files change the order the columns are supposed to be in.

Shmuel
Garnet | Level 18

SAS ignores case while dealing with variable names, so "LAST_NAME" and "last_name" should be considered the same name.

 

SAS cannot decide what name should be for "Member Last Name" or a like. In tose tables, which seems to be imported from excel, you need to create its own step to skip titles and assign variable names, addapted to their order.

ballardw
Super User

@kat100606 wrote:

Hi SAS Community,

 but each dataset may have columns out of order,

 

 

Does not matter. SAS will allow variables to be in any order as long as the variables of the same name are of the same type, numeric or character. You may have an issue with length of character variables though.

 

data example1;
  input x y z;
datalines;
1 2 3
4 5 6
;

data example2;
   input z x y;
datalines;
11 22 33
44 55 66
;


data combined;
   set example1 example2;
run;

You can use data set options to rename a variable, or more than one, at the time the sets are combined:

data example3;
   input x_x Y z;
datalines;
111 222 333
444 555 666
;

data combined2;
   set combined
       example3 (rename=(x_x=x));
run;

Your example of "data starts on line 4" is just plain error on the part of whoever created the data in the first place. That sort of , let us say suboptimal data, pretty much only comes from allowing Proc Import or similar tool to make decisions humans should. Have 3 will have multiple problems, one of which is that it is extremely likely that you have no numeric variables even if the value should be. That is time to REREAD the source data.

Reeza
Super User

You are going to have to some manually cleaning here, no ways around that. But you can simplify it massively for yourself. 

If you have a list of master variable names that you want to use, create that list in SAS. 

Then create a list of all the tables and their variable names. 

Use COMPGED to compare the two lists and take the closest matches. Then manually clean this list. 

Once clean, reimport that data back into SAS and use a macro to automatically go through and do the renaming. 

You may want to add labels while you're at it to make the data consistent in my opinion. 

 

Here's the first part:

data master_names;
input want_var_name $32.;
cards;
last_name
first_name
Member_MBI
Diagnosis
;

*create list of all tables/variable names;
proc sql;
create table orig_names as
select memname, name
from dictionary.columns 
where memname like 'HAVE%';
quit;

*find best name;

proc sql;
create table closest_name as
select a.*, b.want_var_name, compged(compress(a.name), compress(b.want_var_name, "_"), 'in') as distance
from orig_names as a, master_names as b
order by 1, 2, distance
;
quit;

data names;
set closest_name;
by memname name;
if first.name;
run;

Then you export to excel, review and modify the names. As you can see, even something like first_name and name_first wont match. 

You could make your matching more complex but I suspect going through it manually won't be that hard. You can try other fuzzy matching algorithms as well. 

 

 

Then assuming that you have a table of new/old names and the table names you can use PROC DATASETS and CALL EXECUTE to do the renaming for you:

 

data master_names;
input want_var_name $32.;
cards;
last_name
first_name
Member_MBI
Diagnosis
;

*create list of all tables/variable names;
proc sql;
create table orig_names as
select memname, name
from dictionary.columns 
where memname like 'HAVE%';
quit;

*find best name;

proc sql;
create table closest_name as
select a.*, b.want_var_name, compged(compress(a.name), compress(b.want_var_name, "_"), 'in') as distance
from orig_names as a, master_names as b
order by 1, 2, distance
;
quit;

data names;
set closest_name;
by memname name;
if first.name;
run;


*rename these, will error because of duplicates;
data rename_list;
 set names;
 by memname;
 
 *start proc datasets;
 if first.memname then do; str_start_proc = catx(' ', 'proc datasets lib=WORK nodetails nolist; modify ', memname, '; rename ');
 call execute(str_start_proc);
end;

 
 *pass new and old name to proc datasets;
 if name ne want_var_name then rename_string = catt(nliteral(name), '=', want_var_name);
 call execute(rename_string);
 
 
 *if last record then quit;
 If last.memname then do;
    end_string= (';run;quit;');
    call execute(end_string);
 end;
run;

FYI - if you have SAS data quality/data integration there may be an easier way to do this. 

For files like Have3 you should fix them first somehow, either cleaning them via SAS or manually beforehand.

 

Spoiler

@kat100606 wrote:

Hi SAS Community,

I can't figure out a way to dynamically rename columns of datasets because I ran into a couple of snags. I have 121 SAS tables that I need to combine into one, but each dataset may have columns out of order, incorrect naming conventions (LAST_NAME and Last Name), or data lines start on line 4.

 

I included photos and attached an example of this in a text file, if that would be more helpful. I'm using SAS Studio for this

HaveHave

WantWant

proc sql;
   create table have1
       (FIRST_NAME char(7),
        LAST_NAME char(7),
        Member_MBI char(7),
        Diagnosis char(7)
       );

insert into have1
    values('Kat', 'Brown', '1A1', 'Fair')
    values('Mike', 'Salder', 'ME3', 'Good')
    values('Roger', 'Blue', 'ME1', 'Bad');

title 'Have 1';

select *
   from have1;

proc printto; run;


proc sql;
   create table have2
       ('LAST NAME'n char(7),
       'First Name'n char(7),
       Diagnosis char(7)        
       );

insert into have2
    values('Grove', 'Cathy', 'Fair')
    values('Pot', 'Raine', 'Bad')
    values('Kelly', 'Cheyene', 'Good');

title 'Have 2';

select *
   from have2;

proc printto; run;



proc sql;
   create table have3
       (A char(27),
        B char(27),
        C char(27),
        D char(20)        
       );

insert into have3
    values('', '', '', 'Report Period: 09')
    values('', '', '', '')
    values('Member Last Name', 'Member First name', 'Diagnosis', 'Member MBI')
    values('Crab', 'Fish', 'Fair', '7C1')
    values('Blob', 'Cramer', 'Bad', 'PE3')
    values('Mark', 'Pencil', 'Good', 'GR9');

title 'Have 3';

select *
   from have3;

proc printto; run;

proc sql;
   create table want
       (LAST_NAME char(7),
       FIRST_NAME char(7),
       DIAGNOSIS char(7),
       MEMBER_MBI char(7)        
       );

insert into want
	values('Brown', 'Kat', 'Fair', '1A1')
    values('Mike', 'Salder', 'Good', 'ME3')
    values('Roger', 'Blue', 'Bad', 'ME1')
    values('Grove', 'Cathy', 'Fair', '')
    values('Pot', 'Raine', 'Bad', '')
    values('Kelly', 'Cheyene', 'Good', '')
    values('Crab', 'Fish', 'Fair', '7C1')
    values('Blob', 'Cramer', 'Bad', 'PE3')
    values('Mark', 'pencil', 'Good', 'GR9');

title 'Want';

select *
   from want;

 


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 815 views
  • 2 likes
  • 5 in conversation