- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I have the following data set:
ID | Status | Timestamp |
---|---|---|
1122 | idle | 10MAY2013 |
1122 | active | 15MAY2013 |
1122 | idle | 25MAY2013 |
1122 | closed | 01JUN2013 |
Every time the status of an ID changes, a new record for that ID is being created and a timestamp of when the change occured is added to the record. That is how old statuses get historicized, as can be seen in the table.
I would like to add a column that measures how long a certain status has been assigned to an ID. The outcome would look like this:
ID | Status | Timestamp | Duration in days | ||
---|---|---|---|---|---|
1122 | idle | 10MAY2013 | 5 | ||
1122 | active | 15MAY2013 | 10 | ||
1122 | idle | 25MAY2013 | 7 | ||
1122 | closed | 01JUN2013 | ... |
The last value (the cell that contains "...") would be created through additional logic, like
if last.id then duration=today()-timestamp
or something like that.
Your help is much appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can do it like this:
data want;
set have; by id;
lastId = last.id;
if not endNext then
set have(firstobs=2 keep=timeStamp rename=(timeStamp=nextTimeStamp))
end=endNext;
if lastId
then duration= intck("DAY", timeStamp, today());
else duration = intck('DAY', timeStamp, nextTimeStamp);
drop lastId nextTimeStamp;
run;
PG
Message was edited by: PG : Added END= condition to prevent premature datastep ending.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A proc sql way:
proc sql;
select a.*, b.timestamp-a.timestamp as duration
from have a
left join have b
on a.id=b.id
and a.timestamp < b.timestamp
group by a.id, a.status, a.timestamp
having b.timestamp=min(b.timestamp)
order by id,timestamp;
quit;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your dataset is already sorted by ID TIMESTAMP, then you can use the following:
data want;
set have;
by id;
if not last.id then do;
i=_N_+1;
set have(rename=(timestamp=nextstamp) keep=nextstamp) point=i;
duration=intck('day', timestamp, nextstamp);
end;
*drop nextstamp i;
run;
you may have to change the keep= set option to keep=timestamp. I always forget if rename applies before or after keep for set options. This effectively reads in the next TIMESTAMP into a new variable NEXTSTAMP using direct addressing of _N_+1 since your data is presorted.
edit the logic is very similar to that of PG
*edit corrected after PGs' comment. That is the kind of syntax restrictions that I always figure out debugging after my logic is established!
Vince
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This involves double sorting, but if your data is not too big it might work....:)
data test;
length id 8 status $10 timestamp 8;
input id status timestamp anydtdte.;
format timestamp yymmdd10.;
datalines;
1122 idle 10may2013
1122 active 15may2013
1122 idle 25may2013
1122 closed 01jun2013
;
run;
proc sort data=test;
by id descending timestamp;
run;
data new;
set test;
retain tmp;
format nextts yymmdd10.;
if _n_ = 1 then tmp = .;
nextts = tmp;
if timestamp ne . then tmp = timestamp;
duration=intck('day', timestamp, nextts);
run;
proc sort data=new;
by id timestamp;
run;