Solved: Re: Rename columns based on values from other dataset

newboy1218 · Posted 02-22-2022 04:41 PM

Hi, I have 2 datasets where dataset A contains the column headers and dataset B contains the raw data. Is there an efficient way to change the current (old) column header in dataset B according to the value in dataset A?

HAVE-Dataset A:

F1	F2	F3	...	F1000
account_id	name	age	...	zip_code

HAVE-Dataset B:

F1	F2	F3	...	F1000
10000	John White	34	...	M8DO0U
10001	Amy Power	12	...	A7DG2E

WANT-Dataset B:

account_id	name	age	...	zip_code
10000	John White	34	...	M8DO0U
10001	Amy Power	12	...	A7DG2E

Reeza · Posted 02-22-2022 05:58 PM

My approach doesn't modify the data set at all, it just renames the data making more efficient but the best solution is always one you understand and can modify.

General idea is:

data driven code - as data set changes it changes
PROC DATASETS will rename variables in a data set by modifying the metadata rather than re-creating the data set. For large data sets this is much more efficient.
Create an array to reference all the variables
Use VNAME to get the variable current/old name from the array
Use the value of the array as the new name
Create a string that has the format of oldName = newName to pass to proc datasets
Use CALL EXECUTE to run the generated code

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;

data actualdata;
   input f1:$20. f2 $10. f3:$20. f1000:$20.;
datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;
run; 


data rename;
set headings;

array f(*) f:;;

*start proc datasets process to update data set;
call execute('proc datasets lib=work; modify actualdata; rename ');

do i=1 to dim(f);
str = catt(vname(f(i)), " = ", f(i));
call execute(str);
end;

call execute(";quit;");

run;

proc contents data=actualdata;
run;

Log (selected) is generated from the data step dynamically it creates this text that is then executed via SAS:

 NOTE: CALL EXECUTE generated line.
 1         + proc datasets lib=work;
 1         +                         modify actualdata;
 1         +                                            rename
 2         + f1 =account_id    
 3         + f2 =name
 4         + f3 =age
 5         + f1000 =zip_code
 6         + ;
 NOTE: Renaming variable f1 to account_id.
 NOTE: Renaming variable f2 to name.
 NOTE: Renaming variable f3 to age.
 NOTE: Renaming variable f1000 to zip_code.
 6         +  quit;
 
 NOTE: MODIFY was successful for WORK.ACTUALDATA.DATA.
 NOTE: PROCEDURE DATASETS used (Total process time):
       real time           0.03 seconds
       user cpu time       0.04 seconds
       system cpu time     0.00 seconds
       memory              1421.31k
       OS Memory           29612.00k
       Timestamp           02/22/2022 10:52:29 PM
       Step Count                        37  Switch Count  0
       Page Faults                       0
       Page Reclaims                     130
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272

FYI - code in this post is modified slightly to be more dynamic and account for varying amounts of variables (F variables).

View solution in original post

Reeza · Posted 02-22-2022 05:17 PM

data rename;
set datasetA;

array f(1000) f1-f1000;

call execute('proc datasets lib=work; modify datasetB; rename ');

do i=1 to dim(f);
str = catt(vname(f(i)), " = ", f(i));
call execute(str);
end;

call execute(";quit;");

run;

Dynamically rename using PROC DATASETS and CALL EXECUTE.

A proc datasets typically looks like this:

proc datasets library=inputLibrary;
modify dataset2change;
rename originalVarName = NewVariableName 
......
.....

list of variables;
quit;

Using your first data set you can dynamically create that code by looping through the variables and generating the code and CALL EXECUTE executes the code for you.

Untested but general idea is there for you - if you need help debugging post the exact code and log please.

@newboy1218 wrote:

Hi, I have 2 datasets where dataset A contains the column headers and dataset B contains the raw data. Is there an efficient way to change the current (old) column header in dataset B according to the value in dataset A?

HAVE-Dataset A:

F1 F2 F3 ... F1000

account_id name age ... zip_code

HAVE-Dataset B:

F1 F2 F3 ... F1000

10000 John White 34 ... M8DO0U

10001 Amy Power 12 ... A7DG2E

WANT-Dataset B:

account_id name age ... zip_code

10000 John White 34 ... M8DO0U

10001 Amy Power 12 ... A7DG2E

HB · Posted 02-22-2022 05:43 PM

If you don't understand/get Reeza's answer (I don't) try this:

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;


data actualdata;
   input f1:$20. f2 $10. f3:$20. f1000:$20.;
datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;
run; 

*smash the datasets together;
proc datasets;
   append base=headings data=actualdata;
run;

* cheat with a double transpose;
proc transpose data=headings out=wide ;
 var _all_;
run;

proc transpose data=wide(drop=_name_ rename=(col1=_name_)) out=want(drop=_name_ _label_);
   var col:;
   id _name_;
run;

Reeza · Posted 02-22-2022 05:58 PM

My approach doesn't modify the data set at all, it just renames the data making more efficient but the best solution is always one you understand and can modify.

General idea is:

data driven code - as data set changes it changes
PROC DATASETS will rename variables in a data set by modifying the metadata rather than re-creating the data set. For large data sets this is much more efficient.
Create an array to reference all the variables
Use VNAME to get the variable current/old name from the array
Use the value of the array as the new name
Create a string that has the format of oldName = newName to pass to proc datasets
Use CALL EXECUTE to run the generated code

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;

data actualdata;
   input f1:$20. f2 $10. f3:$20. f1000:$20.;
datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;
run; 


data rename;
set headings;

array f(*) f:;;

*start proc datasets process to update data set;
call execute('proc datasets lib=work; modify actualdata; rename ');

do i=1 to dim(f);
str = catt(vname(f(i)), " = ", f(i));
call execute(str);
end;

call execute(";quit;");

run;

proc contents data=actualdata;
run;

Log (selected) is generated from the data step dynamically it creates this text that is then executed via SAS:

 NOTE: CALL EXECUTE generated line.
 1         + proc datasets lib=work;
 1         +                         modify actualdata;
 1         +                                            rename
 2         + f1 =account_id    
 3         + f2 =name
 4         + f3 =age
 5         + f1000 =zip_code
 6         + ;
 NOTE: Renaming variable f1 to account_id.
 NOTE: Renaming variable f2 to name.
 NOTE: Renaming variable f3 to age.
 NOTE: Renaming variable f1000 to zip_code.
 6         +  quit;
 
 NOTE: MODIFY was successful for WORK.ACTUALDATA.DATA.
 NOTE: PROCEDURE DATASETS used (Total process time):
       real time           0.03 seconds
       user cpu time       0.04 seconds
       system cpu time     0.00 seconds
       memory              1421.31k
       OS Memory           29612.00k
       Timestamp           02/22/2022 10:52:29 PM
       Step Count                        37  Switch Count  0
       Page Faults                       0
       Page Reclaims                     130
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272

FYI - code in this post is modified slightly to be more dynamic and account for varying amounts of variables (F variables).

Tom · Posted 02-22-2022 05:55 PM

Take your dataset with the new names and transpose it.

data have;
  input (F1-F3 F1000) (:$32.);
cards;
account_id name age zip_code
;

proc transpose data=have(obs=1) out=names;
  var _all_;
run;

Now you instead of a dataset with 1 observation and N variables you one with N observations and 2 variables, _NAME_ and COL1.

So use that to generate the OLD=NEW pairs needed by the RENAME statement.

Let's create a text file with the RENAME statement in it.

filename rename temp;
data _null_;
  set names end=eof;
  length old new $51 ;
  old=nliteral(_name_);
  new=nliteral(col1);
  file rename ;
  if _n_=1 then put 'rename';
  put old '=' new ;
  if eof then put ';' ;
run;

Now we can use %INCLUDE to add the RENAME statement to our code.

So if you wanted to make new dataset named NEW that is a copy of the existing dataset named B with the new names you would use.

data new;
  set b;
%include rename / source2;
run;

If you wanted to just modify B then use PROC DATASETS instead

proc datasets lib=WORK nolist;
  modify b; 
%include rename / source2;
  run;
quit;

ballardw · Posted 02-23-2022 12:29 PM

How did you read that data into SAS in the first place?

The appearance for both data sets is odd, at least to me. If you were using a data step to read the information then you need to think ahead a bit more in the code. If using Proc Import I have a hard time seeing where you get sequentially numbered F variables, at least from most sources.

If you are going to have multiple data sets of the same structure I would address the reading step to avoid later "fixes".

ghosh · Posted 02-23-2022 03:19 PM

Could do it with proc IML.

To use mixed data types in a matrix you would need SAS/IML 15.1 that was released as part of SAS 9.4m6.

data headings;
	input f1:$20. f2:$20. f3:$20. f1000:$20.;
	datalines;
account_id name age zip_code
;

data actualdata;
	input f1:$20. f2 $10. f3:$20. f1000:$20.;
	datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;

proc iml;
	use headings;
	read all var _CHAR_ into varNames;
	close headings;
	
	use actualdata;
	read all var _ALL_ into mat;	 
	close actualdata;
	
	print varNames, mat;
	create want from mat [colname=varNames];
	append from mat;
	close;
	quit;

proc print data=want;
run;

HB · Posted 02-23-2022 06:31 PM

Holy crap, that's slick @ghosh. Going to have to look at that.

Registration is open

SAS Training: Just a Click Away