Solved: Re: Replace all dataset variables using loop

hermina · Posted 10-24-2017 04:10 AM

Hi,

I am quite new to SAS programming.

I have a dataset from which I wish to create a new dataset where the variables and observations are replaced with the first difference of the orginal dataset values, while renaming the variable names. Sample below.

I wish to do this with ALL variables (I have 20 columns), so I think a do-loop should be managable. Anyone got any suggestions?

My current code is hardcoded and doesn't change the variable names:

DATA outputset;

set inputset;

Food = dif(Food);

House = dif(House);

Material=dif(Mat

....

RUN;

INPUT

Dataperiod	Food	House	Material	…
Year1	1	5	8	…
Year2	8	3	9	…
Year3	3	2	7	…
…	…	…	…	…

WANTED OUTPUT

The variable names (Food, House etc) have no pattern, but I wish to replace ALL the variable names and observations with the below ones, using a loop.

Dataperiod	dif(Food)	dif(House)	dif(Material)	…
Year1				…
Year2	7	-2	1	…
Year3	-5	-1	-2	…
…	…	…	…	…

RW9 · Posted 10-24-2017 04:41 AM

This shows one way to do it.

data have;
  input dataperiod $ food	house	material;
datalines;
Year1	1	5	8	
Year2	8	3	8
Year3	3	2	7
;
run;

data want (drop=i);
  set have;
  array v{3} food house material;
  array r{3} diff_food diff_house diff_material;
  do i=1 to 3;
    r{i}=dif(v{i});
  end;
run;

This shows arrays and looping over arrays. Its the simplest, but if you have lots of variables then it maybe worth considering a transpose approach like below which doesn't require knowing all the variables up front:

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want;
  by dataperiod;
  var diff;
  id _name_;
run;

Note how I put the test data, please do this in future.

View solution in original post

RW9 · Posted 10-24-2017 04:41 AM

This shows one way to do it.

data have;
  input dataperiod $ food	house	material;
datalines;
Year1	1	5	8	
Year2	8	3	8
Year3	3	2	7
;
run;

data want (drop=i);
  set have;
  array v{3} food house material;
  array r{3} diff_food diff_house diff_material;
  do i=1 to 3;
    r{i}=dif(v{i});
  end;
run;

This shows arrays and looping over arrays. Its the simplest, but if you have lots of variables then it maybe worth considering a transpose approach like below which doesn't require knowing all the variables up front:

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want;
  by dataperiod;
  var diff;
  id _name_;
run;

Note how I put the test data, please do this in future.

Shmuel · Posted 10-24-2017 04:44 AM

Try next code (not tested):

data want;
  set have;
      food = food - lag(food);
      house = house - lag(house);
      material = material = lag(material);
run;

What do you mean by LOOP?

Loop accross observations is implicit.

Loop accross all variables need more preparing code:

- are there many variables?

- shall the loop deal witth all of them? no exceptions ???

hermina · Posted 10-24-2017 05:07 AM

Thanks for your quick reply.

What I mean is that I figure there should be a way of saying

"For ALL variables/column headers X1, X2, X3 etc, with no exception, add the word "dif_" to the variable name so that they are called dif_X1, dif_X2, dif_X3 etc, and calculate first difference for each value. Put all of this in a new dataset called "outputset"".

BR

Hermina

RW9 · Posted 10-24-2017 05:22 AM

So which part of my code does not do what you expect?

hermina · Posted 10-27-2017 05:47 AM

Sorry, it did work just fine! Thanks!

ChrisBrooks · Posted 10-24-2017 04:51 AM

I wouldn't do this in a loop - you can create a macro variable holding the rename string and then execute this with Proc Datasets as in the code below NB correct capitalisation is vital in Proc SQL

data inputset;
	length dataperiod $5 food 8 house 8 material 8;
	infile datalines dlm=",";
	input dataperiod $ food house material $;
	datalines;
Year1,1,5,8
Year2,8,3,2
Year3,3,2,7
;
run;


DATA outputset;
	set inputset;
	Food = dif(Food);
	House = dif(House);
	Material=dif(Material);
run;

proc sql noprint;
	select cats(name,'=','dif',name)
	into :varlist
	separated by ' '
	from dictionary.columns
	where libname= 'WORK' and memname= 'INPUTSET' and name ne 'dataperiod';
quit;

%put &varlist;

proc datasets lib=work;
modify outputset;
rename &varlist;
quit;

Patrick · Posted 10-24-2017 05:58 AM

@hermina

What you're asking for can be done but it's on the advanced side and involves SAS Macro coding.

If you are a beginner then don't go there yet. It doesn't hurt to write a bit more code but then to end up with something that you can easily understand and maintain.

The most "basic" code version could look like:

data want1;
  infile datalines truncover dlm=' ';
  input Dataperiod $ Food House Material;
  dif_food = food - lag(food);
  dif_house = house - lag(house);
  dif_material = material - lag(material);
  datalines;
Year1 1 5 8
Year2 8 3 9
Year3 3 2 7
;
run;

That's basically what @Shmuel proposed and if your'e new to SAS then that's what you should be doing in my opinion.

If you want to take it one step further you could use array processing as done below:

data want2(drop=_:);
  infile datalines truncover dlm=' ';
  input Dataperiod $ Food House Material;
  array source {*} Food House Material;
  array target {*} dif_food dif_house dif_material;
  do _i=1 to dim(source);
    target[_i]= source[_i]-lag(source[_i]);
  end;
  datalines;
Year1 1 5 8
Year2 8 3 9
Year3 3 2 7
;
run;

The next step would be SAS macro coding which would allow you to dynamically create your dif_ variables. But if you're rather new to SAS then I strongly recommend that you don't go there as it will create more harm than good for you and I won't even post sample code for such an option.

Should you be an experienced programmer and just new to the SAS language then let us know. That's another situation and happy to provide you with the necessary hints how to generate dynamic code using SAS. But if you don't have this background then get first expertise in Base SAS language before you dive into SAS Macro language or you'll end up "very confused and in a mess".

s_lassen · Posted 10-25-2017 06:27 AM

@hermina, @RW9:

RW9's solution looks good to me, except that you want the new variable names prefixed with "diff_". Which is easily accomplished with a prefix= option in the final PROC TRANSPOSE:

data have;
  input dataperiod $ food      house      material;
datalines;
Year1      1      5      8
Year2      8      3      8
Year3      3      2      7
;
run;

proc transpose data=have out=inter;
  by dataperiod;
  var _numeric_;
run;
proc sort data=inter;
  by _name_ dataperiod;
run;
data inter;
  set inter;
  by _name_;
  diff=dif(col1);
  if first._name_ then diff=.;
run;
proc sort data=inter;
  by dataperiod _name_;
run;
proc transpose data=inter out=want(drop=name) prefix=diff_;
  by dataperiod;
  var diff;
  id _name_;
run;

And then I put in a drop=name option on the final output, you do not need the name of the temporary variable repeated.

s_lassen · Posted 10-25-2017 06:29 AM

Sorry, should have been drop=_name_, not drop=name

mkeintz · Posted 10-25-2017 10:01 AM

Here is a way to achieve your task of (1) renaming selected vars to have DIF_ prefixed to their name, and (2) getting a difference series, without any direct macro programming:

proc transpose data=sashelp.stocks (drop=date obs=0) out=t;
  var _numeric_;
run;

filename tmp temp;
data _null_;
  set t;
  file tmp;
  put _name_ '=dif_' _name_;
run;

data need/view=need;
  set sashelp.stocks;
  rename  %include tmp ;;
run;

data want;
  set need;
  by stock;
  array x {*} dif_: ;
  do I=1 to dim(x);   x{I}=dif(x{I}); end;
  if first.stock then call missing (of x{*});
  drop I;
run;

The first proc transpose does nothing but get the varnames of interest in dataset T, stored in variable _NAME_. That's why it has the "obs=0" data set name parameter.
The data _null_ step creates a temporary raw file with a list of X=dif_X expressions for every variable name X found in T.
The data view step renames the variables by %INCLUDing the raw file as part of a rename statement. The rename statement has two semicolons - one to terminate the %include, and one to terminate the RENAME statement that has the %include as a component. It's a data set view, not a data set file, so virtually no resources are required. It looks like a separated step, but is not activated (instantiated in object programming terms) until it is referred to in the next data step. Having a bunch of variables now named "DIF_..." makes array declaration easier in the subsequent data step.
The DATA want step loops over all vars with names starting as "DIF_", creating your desired values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 10-27-2017 04:31 PM

Here is a very intriguing technique. Instead of a SET statement this program uses the MODIFY statement. This means that a data set is modified in place (much like PROC APPEND, or the APPEND statement in PROC DATASETS).

One of the benefits of the MODIFY statement is than an OUTPUT statement appends a records. So when MODIFY CLASS (below) reads record 1, the OUTPUT statement appends a copy of record 1 to the end of the data set. The MODIFY statement will progress from record 1 through the (original) end of the data set. It will then encounter the newly appended records. The output statement will append those new records, in the order encountered. So here is a very compact way to make 4,003 copies of the datasets:

data class;
  set sashelp.class;
run;

%let copies=4003;
data class;
  modify class nobs=nrecs;
  if _n_> nrecs*(%eval(&copies-1)) then stop;
  output;  /* Append a record */
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Classroom Training Available!