BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
newboy1218
Quartz | Level 8

Hi, I have 2 datasets where dataset A contains the column headers and dataset B contains the raw data. Is there an efficient way to change the current (old) column header in dataset B according to the value in dataset A?

 

HAVE-Dataset A:

F1F2F3...F1000
account_idnameage...zip_code

 

HAVE-Dataset B:

F1F2F3...F1000
10000John White34...M8DO0U
10001Amy Power12...A7DG2E

 

WANT-Dataset B:

account_idnameage...zip_code
10000John White34...M8DO0U
10001Amy Power12...A7DG2E

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

My approach doesn't modify the data set at all, it just renames the data making more efficient but the best solution is always one you understand and can modify. 

 

General idea is:

  • data driven code - as data set changes it changes
  • PROC DATASETS will rename variables in a data set by modifying the metadata rather than re-creating the data set. For large data sets this is much more efficient. 
  • Create an array to reference all the variables
  • Use VNAME to get the variable current/old name from the array
  • Use the value of the array as the new name
  • Create a string that has the format of oldName = newName to pass to proc datasets
  • Use CALL EXECUTE to run the generated code

 

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;

data actualdata;
   input f1:$20. f2 $10. f3:$20. f1000:$20.;
datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;
run; 


data rename;
set headings;

array f(*) f:;;

*start proc datasets process to update data set; call execute('proc datasets lib=work; modify actualdata; rename '); do i=1 to dim(f); str = catt(vname(f(i)), " = ", f(i)); call execute(str); end; call execute(";quit;"); run; proc contents data=actualdata; run;

Log (selected) is generated from the data step dynamically it creates this text that is then executed via SAS:

 NOTE: CALL EXECUTE generated line.
 1         + proc datasets lib=work;
 1         +                         modify actualdata;
 1         +                                            rename
 2         + f1 =account_id    
 3         + f2 =name
 4         + f3 =age
 5         + f1000 =zip_code
 6         + ;
 NOTE: Renaming variable f1 to account_id.
 NOTE: Renaming variable f2 to name.
 NOTE: Renaming variable f3 to age.
 NOTE: Renaming variable f1000 to zip_code.
 6         +  quit;
 
 NOTE: MODIFY was successful for WORK.ACTUALDATA.DATA.
 NOTE: PROCEDURE DATASETS used (Total process time):
       real time           0.03 seconds
       user cpu time       0.04 seconds
       system cpu time     0.00 seconds
       memory              1421.31k
       OS Memory           29612.00k
       Timestamp           02/22/2022 10:52:29 PM
       Step Count                        37  Switch Count  0
       Page Faults                       0
       Page Reclaims                     130
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272
       

FYI - code in this post is modified slightly to be more dynamic and account for varying amounts of variables (F variables).

 

 

View solution in original post

7 REPLIES 7
Reeza
Super User
data rename;
set datasetA;

array f(1000) f1-f1000;

call execute('proc datasets lib=work; modify datasetB; rename ');

do i=1 to dim(f);
str = catt(vname(f(i)), " = ", f(i));
call execute(str);
end;

call execute(";quit;");

run;

Dynamically rename using PROC DATASETS and CALL EXECUTE. 

 

A proc datasets typically looks like this:

 

proc datasets library=inputLibrary;
modify dataset2change;
rename originalVarName = NewVariableName 
......
.....

list of variables;
quit;

Using your first data set you can dynamically create that code by looping through the variables and generating the code and CALL EXECUTE executes the code for you. 

 

Untested but general idea is there for you - if you need help debugging post the exact code and log please. 

 


@newboy1218 wrote:

Hi, I have 2 datasets where dataset A contains the column headers and dataset B contains the raw data. Is there an efficient way to change the current (old) column header in dataset B according to the value in dataset A?

 

HAVE-Dataset A:

F1 F2 F3 ... F1000
account_id name age ... zip_code

 

HAVE-Dataset B:

F1 F2 F3 ... F1000
10000 John White 34 ... M8DO0U
10001 Amy Power 12 ... A7DG2E

 

WANT-Dataset B:

account_id name age ... zip_code
10000 John White 34 ... M8DO0U
10001 Amy Power 12 ... A7DG2E

 


 

HB
Barite | Level 11 HB
Barite | Level 11

If you don't understand/get Reeza's answer (I don't) try this:

 

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;
data actualdata; input f1:$20. f2 $10. f3:$20. f1000:$20.; datalines; 10000 John White 34 M8DO0U 10001 Amy Power 12 A7DG2E ; run; *smash the datasets together; proc datasets; append base=headings data=actualdata; run; * cheat with a double transpose; proc transpose data=headings out=wide ; var _all_; run; proc transpose data=wide(drop=_name_ rename=(col1=_name_)) out=want(drop=_name_ _label_); var col:; id _name_; run;

 

 

 

Reeza
Super User

My approach doesn't modify the data set at all, it just renames the data making more efficient but the best solution is always one you understand and can modify. 

 

General idea is:

  • data driven code - as data set changes it changes
  • PROC DATASETS will rename variables in a data set by modifying the metadata rather than re-creating the data set. For large data sets this is much more efficient. 
  • Create an array to reference all the variables
  • Use VNAME to get the variable current/old name from the array
  • Use the value of the array as the new name
  • Create a string that has the format of oldName = newName to pass to proc datasets
  • Use CALL EXECUTE to run the generated code

 

data headings;
   input f1:$20. f2:$20. f3:$20. f1000:$20.;
datalines;
account_id name age zip_code
;
run;

data actualdata;
   input f1:$20. f2 $10. f3:$20. f1000:$20.;
datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;
run; 


data rename;
set headings;

array f(*) f:;;

*start proc datasets process to update data set; call execute('proc datasets lib=work; modify actualdata; rename '); do i=1 to dim(f); str = catt(vname(f(i)), " = ", f(i)); call execute(str); end; call execute(";quit;"); run; proc contents data=actualdata; run;

Log (selected) is generated from the data step dynamically it creates this text that is then executed via SAS:

 NOTE: CALL EXECUTE generated line.
 1         + proc datasets lib=work;
 1         +                         modify actualdata;
 1         +                                            rename
 2         + f1 =account_id    
 3         + f2 =name
 4         + f3 =age
 5         + f1000 =zip_code
 6         + ;
 NOTE: Renaming variable f1 to account_id.
 NOTE: Renaming variable f2 to name.
 NOTE: Renaming variable f3 to age.
 NOTE: Renaming variable f1000 to zip_code.
 6         +  quit;
 
 NOTE: MODIFY was successful for WORK.ACTUALDATA.DATA.
 NOTE: PROCEDURE DATASETS used (Total process time):
       real time           0.03 seconds
       user cpu time       0.04 seconds
       system cpu time     0.00 seconds
       memory              1421.31k
       OS Memory           29612.00k
       Timestamp           02/22/2022 10:52:29 PM
       Step Count                        37  Switch Count  0
       Page Faults                       0
       Page Reclaims                     130
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272
       

FYI - code in this post is modified slightly to be more dynamic and account for varying amounts of variables (F variables).

 

 

Tom
Super User Tom
Super User

Take your dataset with the new names and transpose it.

data have;
  input (F1-F3 F1000) (:$32.);
cards;
account_id name age zip_code
;

proc transpose data=have(obs=1) out=names;
  var _all_;
run;

Now you instead of a dataset with 1 observation and N variables you one with N observations and 2 variables, _NAME_ and COL1.

So use that to generate the OLD=NEW pairs needed by the RENAME statement.

Let's create a text file with the RENAME statement in it.

filename rename temp;
data _null_;
  set names end=eof;
  length old new $51 ;
  old=nliteral(_name_);
  new=nliteral(col1);
  file rename ;
  if _n_=1 then put 'rename';
  put old '=' new ;
  if eof then put ';' ;
run;

Now we can use %INCLUDE to add the RENAME statement to our code.

So if you wanted to make new dataset named NEW that is a copy of the existing dataset named B with the new names you would use.

data new;
  set b;
%include rename / source2;
run;

If you wanted to just modify B then use PROC DATASETS instead

proc datasets lib=WORK nolist;
  modify b; 
%include rename / source2;
  run;
quit;
ballardw
Super User

How did you read that data into SAS in the first place?

 

The appearance for both data sets is odd, at least to me. If you were using a data step to read the information then you need to think ahead a bit more in the code. If using Proc Import I have a hard time seeing where you get sequentially numbered F variables, at least from most sources.

 

If you are going to have multiple data sets of the same structure I would address the reading step to avoid later "fixes".

ghosh
Barite | Level 11

Could do it with proc IML. 

To use mixed data types in a matrix you would need SAS/IML 15.1 that was released as part of SAS 9.4m6.

data headings;
	input f1:$20. f2:$20. f3:$20. f1000:$20.;
	datalines;
account_id name age zip_code
;

data actualdata;
	input f1:$20. f2 $10. f3:$20. f1000:$20.;
	datalines;
10000 John White 34 M8DO0U
10001 Amy Power 12 A7DG2E
;

proc iml;
	use headings;
	read all var _CHAR_ into varNames;
	close headings;
	
	use actualdata;
	read all var _ALL_ into mat;	 
	close actualdata;
	
	print varNames, mat;
	create want from mat [colname=varNames];
	append from mat;
	close;
	quit;

proc print data=want;
run;

ghosh_0-1645647407695.png

 

HB
Barite | Level 11 HB
Barite | Level 11
Holy crap, that's slick @ghosh. Going to have to look at that.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2343 views
  • 14 likes
  • 6 in conversation