BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ncy
Fluorite | Level 6 ncy
Fluorite | Level 6

Hi SAS expert, 

 

I have a sequential data set, each id have two observations, a sample dataset looks like below:

 

data have;
	input id month var1;
	datalines;
	1 6  2  
	1 14 1 
	2 9  3 
	2 14 1 
	;
run;

First I would like to calculate the change of var1,  var1_change, defined as the last obs minus the first obs, and calculate the change per month,var1_change_per_month, defined as the change/interval, the dataset I got looks like this:

 

data cal;
	input id month var1 var1_change interval var1_change_per_month;
	datalines;
	1 6 2 . . .
	1 14 1 -1 8 -0.125
	2 9 3 . . .
	2 14 1 -2 5 -0.4
	;
run;

Now I would like to fill in the gaps between two observations with the calculated var1_change_per_month, the dataset i want would look like this:

 

data want;
	input id month var1_change_per_month;
	datalines;
	1 6 -0.125
	1 7 -0.125
	1 8 -0.125
	1 9 -0.125
	1 10 -0.125
	1 11 -0.125
	1 12 -0.125
	1 13 -0.125
	1 14 -0.125
	2 9 -0.4
	2 10 -0.4
	2 11 -0.4
	2 12 -0.4
	2 13 -0.4
	2 14 -0.4
;
run;

Any ideas would be appreciated. Thanks!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Have you tried anything?

Do you have to calculate the difference of var1 only, or are there more variables in your data? Assuming: only one variable.

Are there always exactly two observation per id?   Assuming: yes

Is the data always sorted? Assuming: yes

Do you need the dataset "cal" for anything or could we jump from "have" to "want" without any intermediate steps? Assuming: yes

 

This is a clumsy two-step solution, the second step could be integrated into the first one.

data cal;
   set have;
   by id;

   length 
      var1_change interval var1_change_per_month 8
      last_month last_var1 8
   ;
   retain last:;
   drop last:;

   if first.id then do;
      last_month = month;
      last_var1 = var1;
   end;
   else do;
      var1_change = var1 - last_var1;
      interval = month - last_month;
      var1_change_per_month = var1_change / interval;
   end;
run;

data want;
   set cal(keep=id month interval var1_change_per_month);
   by id;

   length start_month end_month 8;
   drop start_month end_month interval;

   if last.id;

   start_month = month - interval;
   end_month = month;

   do month = start_month to end_month;
      output;
   end;
run;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

@ncy wrote:

 

 

I have a sequential data set, each id have two observations, a sample dataset looks like below:

  

 


 

data have;
	input id month var1;
	datalines;
	1 6  2  
	1 14 1 
	2 9  3 
	2 14 1 
	;
run;

data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_=1 then do;
k=2;
set have(keep=month var1 rename=(var1=_var1 month=_m)) point=k;
var1_change_per_month=(_var1-var1)/(_m-month);
_m=month;
output;
end;
else do month=_m+1 to month;
output;
end;
end;
keep id month var1_change_per_month;
run;
andreas_lds
Jade | Level 19

Have you tried anything?

Do you have to calculate the difference of var1 only, or are there more variables in your data? Assuming: only one variable.

Are there always exactly two observation per id?   Assuming: yes

Is the data always sorted? Assuming: yes

Do you need the dataset "cal" for anything or could we jump from "have" to "want" without any intermediate steps? Assuming: yes

 

This is a clumsy two-step solution, the second step could be integrated into the first one.

data cal;
   set have;
   by id;

   length 
      var1_change interval var1_change_per_month 8
      last_month last_var1 8
   ;
   retain last:;
   drop last:;

   if first.id then do;
      last_month = month;
      last_var1 = var1;
   end;
   else do;
      var1_change = var1 - last_var1;
      interval = month - last_month;
      var1_change_per_month = var1_change / interval;
   end;
run;

data want;
   set cal(keep=id month interval var1_change_per_month);
   by id;

   length start_month end_month 8;
   drop start_month end_month interval;

   if last.id;

   start_month = month - interval;
   end_month = month;

   do month = start_month to end_month;
      output;
   end;
run;
ncy
Fluorite | Level 6 ncy
Fluorite | Level 6

Thanks andreas_ids. the solution works very well. 

 

To clarify your assumptions. 

 

Yes, there are exactly two observations per id, and i have more than var1 to calculate. But I can apply this technique to other variables. The 'cal' is a dataset I already got, I put this intermediate dataset here to help formulate my question about getting dataset 'want'.

 

 

Haikuo
Onyx | Level 15

No matter what you have in mind, you will need 2xpass to the data. So 2XDOW seems logical, and it can deal more than 2 rows per id.

 

data have;
	input id month var1;
	datalines;
	1 6  2  
	1 14 1 
	2 9  3 
	2 14 1 
	;
run;

data want;
	do until (last.id);
		set have;
		by id;

		if first.id then
			do;
				_from=month;
				_d1=var1;
			end;

		if last.id then
			do;
				_to=month;
				_d2=var1;
			end;
end;
		RateOfChange=(_d2-_d1)/(_to-_from);

		do until (last.id);
			set have;
			by id;

			if first.id then
				do;
					do month=_from to _to;
						output;
					end;
				end;
		end;
		keep id month RateOfChange;
run;
Haikuo
Onyx | Level 15

Well, had to take my word back.  One don't seem to need 2 pass if it is fixed number of rows per ID. So for instance 2rows/id:

 

data have;
	input id month var1;
	datalines;
	1 6  2  
	1 14 1 
	2 9  3 
	2 14 1 
	;
run;

data want;
  set have;
   by id;
    _dm=dif(month);
	_dv=dif(var1);
	if last.id then do;
	   RateOfChange=_dv/_dm;
	   do month=month-_dm to month;
	     output;
	   end;
	   end;
	   keep id month RateOfChange;
	run;
mkeintz
PROC Star

If you do a "look ahead" by using a self-merge with firstobs=2 you can accommodate more than one record-per-id in a single data step:

 

data have;
	input id month var1;
	datalines;
	1 6  2  
	1 14 1 
	2 9  3 
	2 14 1 
	;
run;

data want (drop=_: var1);
  set have;
  by id;
  set have (firstobs=2 keep=month var1 rename=(month=_nxt_month var1=_nxt_var1));

  var1_change_per_month=(_nxt_var1-var1)/(_nxt_month-month);
  if not last.id then do month=month to _nxt_month;
    output;
  end;
run;

 

Now I wrote "merge" even though the program doesn't use a MERGE statement.  Instead it uses two SET's, one starting out at the second observation.   Ordinarily this technique would be a problem because when the first SET reads the last obs, the second SET attempts to read beyond the last obs, causing the data step to immediately stop.   For this task, however, that's just fine.

 

The reason I chose to use two SETs instead of a MERGE is to take advantage of the BY statement, which of course allows use of the "if not last.id" test.  A merge with BY would not preserve the offset initially provided by firstobs=2.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ncy
Fluorite | Level 6 ncy
Fluorite | Level 6

Thanks mkeintz for the solution and explanation. A good technique! 

Astounding
PROC Star

If it were up to me ....

 

data want;

set have;

by id;

interval = dif(month);

var1_change = dif(var1);

if last.id;

var1_change_per_month = var1_change / interval;

do month = month - interval to month;

    output;

end;

 

keep id month var1_change_per_month;

run;

 

With apologies to @Haikuo since I belatedly noticed how similar this is to your solution!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1320 views
  • 8 likes
  • 6 in conversation