Hi SAS expert,
I have a sequential data set, each id have two observations, a sample dataset looks like below:
data have;
input id month var1;
datalines;
1 6 2
1 14 1
2 9 3
2 14 1
;
run;
First I would like to calculate the change of var1, var1_change, defined as the last obs minus the first obs, and calculate the change per month,var1_change_per_month, defined as the change/interval, the dataset I got looks like this:
data cal;
input id month var1 var1_change interval var1_change_per_month;
datalines;
1 6 2 . . .
1 14 1 -1 8 -0.125
2 9 3 . . .
2 14 1 -2 5 -0.4
;
run;
Now I would like to fill in the gaps between two observations with the calculated var1_change_per_month, the dataset i want would look like this:
data want;
input id month var1_change_per_month;
datalines;
1 6 -0.125
1 7 -0.125
1 8 -0.125
1 9 -0.125
1 10 -0.125
1 11 -0.125
1 12 -0.125
1 13 -0.125
1 14 -0.125
2 9 -0.4
2 10 -0.4
2 11 -0.4
2 12 -0.4
2 13 -0.4
2 14 -0.4
;
run;
Any ideas would be appreciated. Thanks!
Have you tried anything?
Do you have to calculate the difference of var1 only, or are there more variables in your data? Assuming: only one variable.
Are there always exactly two observation per id? Assuming: yes
Is the data always sorted? Assuming: yes
Do you need the dataset "cal" for anything or could we jump from "have" to "want" without any intermediate steps? Assuming: yes
This is a clumsy two-step solution, the second step could be integrated into the first one.
data cal;
set have;
by id;
length
var1_change interval var1_change_per_month 8
last_month last_var1 8
;
retain last:;
drop last:;
if first.id then do;
last_month = month;
last_var1 = var1;
end;
else do;
var1_change = var1 - last_var1;
interval = month - last_month;
var1_change_per_month = var1_change / interval;
end;
run;
data want;
set cal(keep=id month interval var1_change_per_month);
by id;
length start_month end_month 8;
drop start_month end_month interval;
if last.id;
start_month = month - interval;
end_month = month;
do month = start_month to end_month;
output;
end;
run;
@ncy wrote:
I have a sequential data set, each id have two observations, a sample dataset looks like below:
data have;
input id month var1;
datalines;
1 6 2
1 14 1
2 9 3
2 14 1
;
run;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_=1 then do;
k=2;
set have(keep=month var1 rename=(var1=_var1 month=_m)) point=k;
var1_change_per_month=(_var1-var1)/(_m-month);
_m=month;
output;
end;
else do month=_m+1 to month;
output;
end;
end;
keep id month var1_change_per_month;
run;
Have you tried anything?
Do you have to calculate the difference of var1 only, or are there more variables in your data? Assuming: only one variable.
Are there always exactly two observation per id? Assuming: yes
Is the data always sorted? Assuming: yes
Do you need the dataset "cal" for anything or could we jump from "have" to "want" without any intermediate steps? Assuming: yes
This is a clumsy two-step solution, the second step could be integrated into the first one.
data cal;
set have;
by id;
length
var1_change interval var1_change_per_month 8
last_month last_var1 8
;
retain last:;
drop last:;
if first.id then do;
last_month = month;
last_var1 = var1;
end;
else do;
var1_change = var1 - last_var1;
interval = month - last_month;
var1_change_per_month = var1_change / interval;
end;
run;
data want;
set cal(keep=id month interval var1_change_per_month);
by id;
length start_month end_month 8;
drop start_month end_month interval;
if last.id;
start_month = month - interval;
end_month = month;
do month = start_month to end_month;
output;
end;
run;
Thanks andreas_ids. the solution works very well.
To clarify your assumptions.
Yes, there are exactly two observations per id, and i have more than var1 to calculate. But I can apply this technique to other variables. The 'cal' is a dataset I already got, I put this intermediate dataset here to help formulate my question about getting dataset 'want'.
No matter what you have in mind, you will need 2xpass to the data. So 2XDOW seems logical, and it can deal more than 2 rows per id.
data have;
input id month var1;
datalines;
1 6 2
1 14 1
2 9 3
2 14 1
;
run;
data want;
do until (last.id);
set have;
by id;
if first.id then
do;
_from=month;
_d1=var1;
end;
if last.id then
do;
_to=month;
_d2=var1;
end;
end;
RateOfChange=(_d2-_d1)/(_to-_from);
do until (last.id);
set have;
by id;
if first.id then
do;
do month=_from to _to;
output;
end;
end;
end;
keep id month RateOfChange;
run;
Well, had to take my word back. One don't seem to need 2 pass if it is fixed number of rows per ID. So for instance 2rows/id:
data have;
input id month var1;
datalines;
1 6 2
1 14 1
2 9 3
2 14 1
;
run;
data want;
set have;
by id;
_dm=dif(month);
_dv=dif(var1);
if last.id then do;
RateOfChange=_dv/_dm;
do month=month-_dm to month;
output;
end;
end;
keep id month RateOfChange;
run;
If you do a "look ahead" by using a self-merge with firstobs=2 you can accommodate more than one record-per-id in a single data step:
data have;
input id month var1;
datalines;
1 6 2
1 14 1
2 9 3
2 14 1
;
run;
data want (drop=_: var1);
set have;
by id;
set have (firstobs=2 keep=month var1 rename=(month=_nxt_month var1=_nxt_var1));
var1_change_per_month=(_nxt_var1-var1)/(_nxt_month-month);
if not last.id then do month=month to _nxt_month;
output;
end;
run;
Now I wrote "merge" even though the program doesn't use a MERGE statement. Instead it uses two SET's, one starting out at the second observation. Ordinarily this technique would be a problem because when the first SET reads the last obs, the second SET attempts to read beyond the last obs, causing the data step to immediately stop. For this task, however, that's just fine.
The reason I chose to use two SETs instead of a MERGE is to take advantage of the BY statement, which of course allows use of the "if not last.id" test. A merge with BY would not preserve the offset initially provided by firstobs=2.
Thanks mkeintz for the solution and explanation. A good technique!
If it were up to me ....
data want;
set have;
by id;
interval = dif(month);
var1_change = dif(var1);
if last.id;
var1_change_per_month = var1_change / interval;
do month = month - interval to month;
output;
end;
keep id month var1_change_per_month;
run;
With apologies to @Haikuo since I belatedly noticed how similar this is to your solution!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.