SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Recoding multiple data files at once how

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Recoding multiple data files at once how

[ Edited ]

Hi,

I am trying to delete the first 3 rows and rename variables from my data sets. How can I use 1 statement that applies to all my data sets, I have 42 files.

 

All my filename start with "_"

I tried something like this but you can tell I don't know how it can be done:

 

data _: ;
set _: ;
rename first = First_name;
run;

 

 

Thank you

 


Accepted Solutions
Solution
‎05-04-2018 05:46 PM
Super User
Posts: 13,347

Re: Recoding multiple data files at once how

Generally a question like this has a cause that should have been addressed earlier, such as when the data were read into SAS.

 

If you read the original files with a data step you can tell SAS that the first actual row of data is on the 4th row of the file and SAS will skip those you don't want: infile "your file name" firstobs=4; Though if there is a row of column headers then you want to skip that as well and would likely want firstobs=5.

 

The input statement in a data step would set the variable names. If the files have the same structure, apparently from the request to do the same thing to multiple files you could read the data and change the output data set name and the file read with the remaining program staying the same.

 

Data new;

   set old;

   if _n_ > 3;

run;

 

is one way to remove the first three rows. But you would have to run that for each data set (changing the data set names).

 

You will want to be careful of using Set _n: ; that will attempt to bring ALL the data sets of that name into a single data set.

 

Do you have list of the desired variable names? Is it the same for each data set? Are they always replacing a variable with the same order?

A first step would be to create a data set with the ORDER and Newname. and use that with information from your data sets.

Here's a rough example.

data work.class;
   set sashelp.class;
run;

data work.mynamelist;
   informat newname $32.;
   input NewName;
   order = _n_;
datalines;
StudentName
Gender
AgeAtMeasure
Inches
Pounds
;
run;

proc sql;
   create table rename as
   select a.*,b.name as oldname
   from work.mynamelist as a
        left join
        (select name,varnum from dictionary.columns 
         where libname='WORK' and memname='CLASS') as b
        on a.order=b.varnum
  ;
quit;

data _null_;
   set rename end=last;
   if _n_=1 then do;
      call execute('proc datasets library=work nodetails nolist;');
       /* class is the name of the data set to rename variables in*/
      call execute('modify CLASS;');
      call execute('rename '); 
   end;
   call execute( catx('=',oldname,newname));
   if last then do;
      /* ends the block of rename statements*/
      Call execute(';'); 
      /* ends the Proc Datasets procedure which uses quit instead of run*/
      call execute('quit;');
   end;
run;

Dictionary.columns is special data view that SAS maintains with a list of all the variables and their characteristics for each library and dataset (memname). The library name is in the LIBNAME variable and is stored in uppercase and MEMNAME is also uppercase. So you need to have those values in the Proc SQL code in uppercase to get the right information.

 

 

Proc SQL does not recognize variable or dataset lists at all though you might be able to get the data set name with the LIKE or a substring comparison.

I will leave as an exercise for the interested reader on bringing in the memname variable and all the variables, getting the order correct and using BY group processing in the data _null_ to create a modify and rename block for each memname.

 

Call Execute in effect writes lines of code into a stack that executes after the data step ends. You would see something similar to this in the log to indicate the generated code:

NOTE: CALL EXECUTE generated line.
1   + proc datasets library=work nodetails nolist;
2   + modify CLASS;
3   + rename
4   + Name=StudentName
5   + Sex=Gender
6   + Age=AgeAtMeasure
7   + Height=Inches
8   + Weight=Pounds
9   + ;
NOTE: Renaming variable Name to StudentName.
NOTE: Renaming variable Sex to Gender.
NOTE: Renaming variable Age to AgeAtMeasure.
NOTE: Renaming variable Height to Inches.
NOTE: Renaming variable Weight to Pounds.
10  + quit;

NOTE: MODIFY was successful for WORK.CLASS.DATA.

View solution in original post


All Replies
Solution
‎05-04-2018 05:46 PM
Super User
Posts: 13,347

Re: Recoding multiple data files at once how

Generally a question like this has a cause that should have been addressed earlier, such as when the data were read into SAS.

 

If you read the original files with a data step you can tell SAS that the first actual row of data is on the 4th row of the file and SAS will skip those you don't want: infile "your file name" firstobs=4; Though if there is a row of column headers then you want to skip that as well and would likely want firstobs=5.

 

The input statement in a data step would set the variable names. If the files have the same structure, apparently from the request to do the same thing to multiple files you could read the data and change the output data set name and the file read with the remaining program staying the same.

 

Data new;

   set old;

   if _n_ > 3;

run;

 

is one way to remove the first three rows. But you would have to run that for each data set (changing the data set names).

 

You will want to be careful of using Set _n: ; that will attempt to bring ALL the data sets of that name into a single data set.

 

Do you have list of the desired variable names? Is it the same for each data set? Are they always replacing a variable with the same order?

A first step would be to create a data set with the ORDER and Newname. and use that with information from your data sets.

Here's a rough example.

data work.class;
   set sashelp.class;
run;

data work.mynamelist;
   informat newname $32.;
   input NewName;
   order = _n_;
datalines;
StudentName
Gender
AgeAtMeasure
Inches
Pounds
;
run;

proc sql;
   create table rename as
   select a.*,b.name as oldname
   from work.mynamelist as a
        left join
        (select name,varnum from dictionary.columns 
         where libname='WORK' and memname='CLASS') as b
        on a.order=b.varnum
  ;
quit;

data _null_;
   set rename end=last;
   if _n_=1 then do;
      call execute('proc datasets library=work nodetails nolist;');
       /* class is the name of the data set to rename variables in*/
      call execute('modify CLASS;');
      call execute('rename '); 
   end;
   call execute( catx('=',oldname,newname));
   if last then do;
      /* ends the block of rename statements*/
      Call execute(';'); 
      /* ends the Proc Datasets procedure which uses quit instead of run*/
      call execute('quit;');
   end;
run;

Dictionary.columns is special data view that SAS maintains with a list of all the variables and their characteristics for each library and dataset (memname). The library name is in the LIBNAME variable and is stored in uppercase and MEMNAME is also uppercase. So you need to have those values in the Proc SQL code in uppercase to get the right information.

 

 

Proc SQL does not recognize variable or dataset lists at all though you might be able to get the data set name with the LIKE or a substring comparison.

I will leave as an exercise for the interested reader on bringing in the memname variable and all the variables, getting the order correct and using BY group processing in the data _null_ to create a modify and rename block for each memname.

 

Call Execute in effect writes lines of code into a stack that executes after the data step ends. You would see something similar to this in the log to indicate the generated code:

NOTE: CALL EXECUTE generated line.
1   + proc datasets library=work nodetails nolist;
2   + modify CLASS;
3   + rename
4   + Name=StudentName
5   + Sex=Gender
6   + Age=AgeAtMeasure
7   + Height=Inches
8   + Weight=Pounds
9   + ;
NOTE: Renaming variable Name to StudentName.
NOTE: Renaming variable Sex to Gender.
NOTE: Renaming variable Age to AgeAtMeasure.
NOTE: Renaming variable Height to Inches.
NOTE: Renaming variable Weight to Pounds.
10  + quit;

NOTE: MODIFY was successful for WORK.CLASS.DATA.

Frequent Contributor
Posts: 81

Re: Recoding multiple data files at once how

Thank you so much. It worked
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 369 views
  • 2 likes
  • 2 in conversation