DATA Step, Macro, Functions and more

Replace all dataset variables using loop

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Replace all dataset variables using loop

Hi,

 

I am quite new to SAS programming.

I have a dataset from which I wish to create a new dataset where the variables and observations are replaced with the first difference of the orginal dataset values, while renaming the variable names. Sample below. 

I wish to do this with ALL variables (I have 20 columns), so I think a do-loop should be managable. Anyone got any suggestions?

 

My current code is hardcoded and doesn't change the variable names:

 

DATA outputset;

set inputset;

Food = dif(Food);

House = dif(House);

Material=dif(Mat

....

RUN;

 

 

INPUT

DataperiodFoodHouseMaterial
Year1158
Year2839
Year3327

 

WANTED OUTPUT

The variable names (Food, House etc) have no pattern, but I wish to replace ALL the variable names and observations with the below ones, using a loop. 

Dataperioddif(Food)dif(House)dif(Material)
Year1   
Year27-21
Year3-5-1-2

Accepted Solutions
Solution
‎10-27-2017 05:47 AM
Super User
Super User
Posts: 9,397

Re: Replace all dataset variables using loop

This shows one way to do it. 

data have;
  input dataperiod $ food	house	material;
datalines;
Year1	1	5	8	
Year2	8	3	8
Year3	3	2	7
;
run;

data want (drop=i);
  set have;
  array v{3} food house material;
  array r{3} diff_food diff_house diff_material;
  do i=1 to 3;
    r{i}=dif(v{i});
  end;
run;

This shows arrays and looping over arrays.  Its the simplest, but if you have lots of variables then it maybe worth considering a transpose approach like below which doesn't require knowing all the variables up front:

 

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want;
  by dataperiod;
  var diff;
  id _name_;
run;

Note how I put the test data, please do this in future.

View solution in original post


All Replies
Solution
‎10-27-2017 05:47 AM
Super User
Super User
Posts: 9,397

Re: Replace all dataset variables using loop

This shows one way to do it. 

data have;
  input dataperiod $ food	house	material;
datalines;
Year1	1	5	8	
Year2	8	3	8
Year3	3	2	7
;
run;

data want (drop=i);
  set have;
  array v{3} food house material;
  array r{3} diff_food diff_house diff_material;
  do i=1 to 3;
    r{i}=dif(v{i});
  end;
run;

This shows arrays and looping over arrays.  Its the simplest, but if you have lots of variables then it maybe worth considering a transpose approach like below which doesn't require knowing all the variables up front:

 

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want;
  by dataperiod;
  var diff;
  id _name_;
run;

Note how I put the test data, please do this in future.

Trusted Advisor
Posts: 1,831

Re: Replace all dataset variables using loop

Try next code (not tested):

 

 

data want;
  set have;
      food = food - lag(food);
      house = house - lag(house);
      material = material = lag(material);
run;

What do you mean by LOOP? 

Loop accross observations is implicit.

Loop accross all variables need more preparing code:

- are there many variables?

- shall the loop deal witth all of them? no exceptions ???

 

Occasional Contributor
Posts: 5

Re: Replace all dataset variables using loop

Thanks for your quick reply.

What I mean is that I figure there should be a way of saying

"For ALL variables/column headers X1, X2, X3 etc, with no exception, add the word "dif_" to the variable name so that they are called dif_X1, dif_X2, dif_X3 etc, and calculate first difference for each value. Put all of this in a new dataset called "outputset"".

 

BR

Hermina

Super User
Super User
Posts: 9,397

Re: Replace all dataset variables using loop

So which part of my code does not do what you expect?

Occasional Contributor
Posts: 5

Re: Replace all dataset variables using loop

Sorry, it did work just fine! Thanks!

Valued Guide
Posts: 570

Re: Replace all dataset variables using loop

I wouldn't do this in a loop - you can create a macro variable holding the rename string and then execute this with Proc Datasets as in the code below NB correct capitalisation is vital in Proc SQL

 

data inputset;
	length dataperiod $5 food 8 house 8 material 8;
	infile datalines dlm=",";
	input dataperiod $ food house material $;
	datalines;
Year1,1,5,8
Year2,8,3,2
Year3,3,2,7
;
run;


DATA outputset;
	set inputset;
	Food = dif(Food);
	House = dif(House);
	Material=dif(Material);
run;

proc sql noprint;
	select cats(name,'=','dif',name)
	into :varlist
	separated by ' '
	from dictionary.columns
	where libname= 'WORK' and memname= 'INPUTSET' and name ne 'dataperiod';
quit;

%put &varlist;

proc datasets lib=work;
modify outputset;
rename &varlist;
quit;
Respected Advisor
Posts: 4,665

Re: Replace all dataset variables using loop

[ Edited ]

@hermina

What you're asking for can be done but it's on the advanced side and involves SAS Macro coding.

If you are a beginner then don't go there yet. It doesn't hurt to write a bit more code but then to end up with something that you can easily understand and maintain.

 

The most "basic" code version could look like:

data want1;
  infile datalines truncover dlm=' ';
  input Dataperiod $ Food House Material;
  dif_food = food - lag(food);
  dif_house = house - lag(house);
  dif_material = material - lag(material);
  datalines;
Year1 1 5 8
Year2 8 3 9
Year3 3 2 7
;
run;

That's basically what @Shmuel proposed and if your'e new to SAS then that's what you should be doing in my opinion.

 

If you want to take it one step further you could use array processing as done below:

data want2(drop=_:);
  infile datalines truncover dlm=' ';
  input Dataperiod $ Food House Material;
  array source {*} Food House Material;
  array target {*} dif_food dif_house dif_material;
  do _i=1 to dim(source);
    target[_i]= source[_i]-lag(source[_i]);
  end;
  datalines;
Year1 1 5 8
Year2 8 3 9
Year3 3 2 7
;
run;

The next step would be SAS macro coding which would allow you to dynamically create your dif_ variables. But if you're rather new to SAS then I strongly recommend that you don't go there as it will create more harm than good for you and I won't even post sample code for such an option.

 

Should you be an experienced programmer and just new to the SAS language then let us know. That's another situation and happy to provide you with the necessary hints how to generate dynamic code using SAS. But if you don't have this background then get first expertise in Base SAS language before you dive into SAS Macro language or you'll end up "very confused and in a mess".

PROC Star
Posts: 253

Re: Replace all dataset variables using loop

@hermina, @RW9:

RW9's solution looks good to me, except that you want the new variable names prefixed with "diff_". Which is easily accomplished with a prefix= option in the final PROC TRANSPOSE:

data have;
  input dataperiod $ food      house      material;
datalines;
Year1      1      5      8
Year2      8      3      8
Year3      3      2      7
;
run;

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want(drop=name) prefix=diff_;
  by dataperiod;
  var diff;
  id _name_;
run;

And then I put in a drop=name option on the final output, you do not need the name of the temporary variable repeated.

PROC Star
Posts: 253

Re: Replace all dataset variables using loop

Sorry, should have been drop=_name_, not drop=name
Trusted Advisor
Posts: 1,309

Re: Replace all dataset variables using loop

Here is a way to achieve your task of (1) renaming selected vars to have DIF_  prefixed to their name, and (2) getting a difference series, without any direct macro programming:

 

proc transpose data=sashelp.stocks (drop=date obs=0) out=t;
  var _numeric_;
run;

filename tmp temp;
data _null_;
  set t;
  file tmp;
  put _name_ '=dif_' _name_;
run;

data need/view=need;
  set sashelp.stocks;
  rename  %include tmp ;;
run;

data want;
  set need;
  by stock;
  array x {*} dif_: ;
  do I=1 to dim(x);   x{I}=dif(x{I}); end;
  if first.stock then call missing (of x{*});
  drop I;
run;
  

 

  1. The first proc transpose does nothing but get the varnames of interest in dataset T, stored in variable _NAME_.  That's why it has the "obs=0" data set name parameter.
  2. The data _null_ step creates a temporary raw file with a list of   X=dif_X  expressions for every variable name X found in T.
  3. The data view step renames the variables by %INCLUDing the raw file as part of a rename statement.  The rename statement  has two semicolons - one to terminate the %include, and one to terminate the RENAME statement that has the %include as a component.  It's a data set view, not a data set file, so virtually no resources are required.  It looks like a separated step, but is not activated (instantiated in object programming terms) until it is referred to in the next data step.  Having a bunch of variables now named "DIF_..." makes array declaration easier in the subsequent data step.
  4. The DATA want step loops over all vars with names starting as "DIF_", creating your desired values.
Trusted Advisor
Posts: 1,309

Re: Replace all dataset variables using loop

Here is a very intriguing technique.  Instead of a SET statement this program uses the MODIFY statement.  This means that a data set is modified in place (much like PROC APPEND, or the APPEND statement in PROC DATASETS).  

 

One of the benefits of the MODIFY statement is than an OUTPUT statement appends a records.  So when MODIFY CLASS (below) reads record 1, the OUTPUT statement appends a copy of record 1 to the end of the data set.  The MODIFY statement will progress from record 1 through the (original) end of the data set.  It will then encounter the newly appended records.  The output statement will append those new records, in the order encountered.  So here is a very compact way to make 4,003 copies of the datasets:

 

data class;
  set sashelp.class;
run;

%let copies=4003;
data class;
  modify class nobs=nrecs;
  if _n_> nrecs*(%eval(&copies-1)) then stop;
  output;  /* Append a record */
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 216 views
  • 1 like
  • 7 in conversation