10-01-2013 11:26 AM
Hello, I have the following data set:
Every time the status of an ID changes, a new record for that ID is being created and a timestamp of when the change occured is added to the record. That is how old statuses get historicized, as can be seen in the table.
I would like to add a column that measures how long a certain status has been assigned to an ID. The outcome would look like this:
|Status||Timestamp||Duration in days|
The last value (the cell that contains "...") would be created through additional logic, like
if last.id then duration=today()-timestamp
or something like that.
Your help is much appreciated!
10-01-2013 01:35 PM
You can do it like this:
set have; by id;
lastId = last.id;
if not endNext then
set have(firstobs=2 keep=timeStamp rename=(timeStamp=nextTimeStamp))
then duration= intck("DAY", timeStamp, today());
else duration = intck('DAY', timeStamp, nextTimeStamp);
drop lastId nextTimeStamp;
Message was edited by: PG : Added END= condition to prevent premature datastep ending.
10-01-2013 03:21 PM
A proc sql way:
select a.*, b.timestamp-a.timestamp as duration
from have a
left join have b
and a.timestamp < b.timestamp
group by a.id, a.status, a.timestamp
order by id,timestamp;
10-02-2013 08:21 AM
If your dataset is already sorted by ID TIMESTAMP, then you can use the following:
if not last.id then do;
set have(rename=(timestamp=nextstamp) keep=nextstamp) point=i;
duration=intck('day', timestamp, nextstamp);
*drop nextstamp i;
you may have to change the keep= set option to keep=timestamp. I always forget if rename applies before or after keep for set options. This effectively reads in the next TIMESTAMP into a new variable NEXTSTAMP using direct addressing of _N_+1 since your data is presorted.
edit the logic is very similar to that of PG
*edit corrected after PGs' comment. That is the kind of syntax restrictions that I always figure out debugging after my logic is established!
10-02-2013 03:04 PM
10-03-2013 02:35 AM
This involves double sorting, but if your data is not too big it might work....
length id 8 status $10 timestamp 8;
input id status timestamp anydtdte.;
format timestamp yymmdd10.;
1122 idle 10may2013
1122 active 15may2013
1122 idle 25may2013
1122 closed 01jun2013
proc sort data=test;
by id descending timestamp;
format nextts yymmdd10.;
if _n_ = 1 then tmp = .;
nextts = tmp;
if timestamp ne . then tmp = timestamp;
duration=intck('day', timestamp, nextts);
proc sort data=new;
by id timestamp;