You are going to have to some manually cleaning here, no ways around that. But you can simplify it massively for yourself.
If you have a list of master variable names that you want to use, create that list in SAS.
Then create a list of all the tables and their variable names.
Use COMPGED to compare the two lists and take the closest matches. Then manually clean this list.
Once clean, reimport that data back into SAS and use a macro to automatically go through and do the renaming.
You may want to add labels while you're at it to make the data consistent in my opinion.
Here's the first part:
data master_names;
input want_var_name $32.;
cards;
last_name
first_name
Member_MBI
Diagnosis
;
*create list of all tables/variable names;
proc sql;
create table orig_names as
select memname, name
from dictionary.columns
where memname like 'HAVE%';
quit;
*find best name;
proc sql;
create table closest_name as
select a.*, b.want_var_name, compged(compress(a.name), compress(b.want_var_name, "_"), 'in') as distance
from orig_names as a, master_names as b
order by 1, 2, distance
;
quit;
data names;
set closest_name;
by memname name;
if first.name;
run;
Then you export to excel, review and modify the names. As you can see, even something like first_name and name_first wont match.
You could make your matching more complex but I suspect going through it manually won't be that hard. You can try other fuzzy matching algorithms as well.
Then assuming that you have a table of new/old names and the table names you can use PROC DATASETS and CALL EXECUTE to do the renaming for you:
data master_names;
input want_var_name $32.;
cards;
last_name
first_name
Member_MBI
Diagnosis
;
*create list of all tables/variable names;
proc sql;
create table orig_names as
select memname, name
from dictionary.columns
where memname like 'HAVE%';
quit;
*find best name;
proc sql;
create table closest_name as
select a.*, b.want_var_name, compged(compress(a.name), compress(b.want_var_name, "_"), 'in') as distance
from orig_names as a, master_names as b
order by 1, 2, distance
;
quit;
data names;
set closest_name;
by memname name;
if first.name;
run;
*rename these, will error because of duplicates;
data rename_list;
set names;
by memname;
*start proc datasets;
if first.memname then do; str_start_proc = catx(' ', 'proc datasets lib=WORK nodetails nolist; modify ', memname, '; rename ');
call execute(str_start_proc);
end;
*pass new and old name to proc datasets;
if name ne want_var_name then rename_string = catt(nliteral(name), '=', want_var_name);
call execute(rename_string);
*if last record then quit;
If last.memname then do;
end_string= (';run;quit;');
call execute(end_string);
end;
run;
FYI - if you have SAS data quality/data integration there may be an easier way to do this.
For files like Have3 you should fix them first somehow, either cleaning them via SAS or manually beforehand.
@kat100606 wrote:
Hi SAS Community,
I can't figure out a way to dynamically rename columns of datasets because I ran into a couple of snags. I have 121 SAS tables that I need to combine into one, but each dataset may have columns out of order, incorrect naming conventions (LAST_NAME and Last Name), or data lines start on line 4.
I included photos and attached an example of this in a text file, if that would be more helpful. I'm using SAS Studio for this
Have
Want
proc sql;
create table have1
(FIRST_NAME char(7),
LAST_NAME char(7),
Member_MBI char(7),
Diagnosis char(7)
);
insert into have1
values('Kat', 'Brown', '1A1', 'Fair')
values('Mike', 'Salder', 'ME3', 'Good')
values('Roger', 'Blue', 'ME1', 'Bad');
title 'Have 1';
select *
from have1;
proc printto; run;
proc sql;
create table have2
('LAST NAME'n char(7),
'First Name'n char(7),
Diagnosis char(7)
);
insert into have2
values('Grove', 'Cathy', 'Fair')
values('Pot', 'Raine', 'Bad')
values('Kelly', 'Cheyene', 'Good');
title 'Have 2';
select *
from have2;
proc printto; run;
proc sql;
create table have3
(A char(27),
B char(27),
C char(27),
D char(20)
);
insert into have3
values('', '', '', 'Report Period: 09')
values('', '', '', '')
values('Member Last Name', 'Member First name', 'Diagnosis', 'Member MBI')
values('Crab', 'Fish', 'Fair', '7C1')
values('Blob', 'Cramer', 'Bad', 'PE3')
values('Mark', 'Pencil', 'Good', 'GR9');
title 'Have 3';
select *
from have3;
proc printto; run;
proc sql;
create table want
(LAST_NAME char(7),
FIRST_NAME char(7),
DIAGNOSIS char(7),
MEMBER_MBI char(7)
);
insert into want
values('Brown', 'Kat', 'Fair', '1A1')
values('Mike', 'Salder', 'Good', 'ME3')
values('Roger', 'Blue', 'Bad', 'ME1')
values('Grove', 'Cathy', 'Fair', '')
values('Pot', 'Raine', 'Bad', '')
values('Kelly', 'Cheyene', 'Good', '')
values('Crab', 'Fish', 'Fair', '7C1')
values('Blob', 'Cramer', 'Bad', 'PE3')
values('Mark', 'pencil', 'Good', 'GR9');
title 'Want';
select *
from want;
... View more