Hello, I have the following data set:
ID | Status | Timestamp |
---|---|---|
1122 | idle | 10MAY2013 |
1122 | active | 15MAY2013 |
1122 | idle | 25MAY2013 |
1122 | closed | 01JUN2013 |
Every time the status of an ID changes, a new record for that ID is being created and a timestamp of when the change occured is added to the record. That is how old statuses get historicized, as can be seen in the table.
I would like to add a column that measures how long a certain status has been assigned to an ID. The outcome would look like this:
ID | Status | Timestamp | Duration in days | ||
---|---|---|---|---|---|
1122 | idle | 10MAY2013 | 5 | ||
1122 | active | 15MAY2013 | 10 | ||
1122 | idle | 25MAY2013 | 7 | ||
1122 | closed | 01JUN2013 | ... |
The last value (the cell that contains "...") would be created through additional logic, like
if last.id then duration=today()-timestamp
or something like that.
Your help is much appreciated!
You can do it like this:
data want;
set have; by id;
lastId = last.id;
if not endNext then
set have(firstobs=2 keep=timeStamp rename=(timeStamp=nextTimeStamp))
end=endNext;
if lastId
then duration= intck("DAY", timeStamp, today());
else duration = intck('DAY', timeStamp, nextTimeStamp);
drop lastId nextTimeStamp;
run;
PG
Message was edited by: PG : Added END= condition to prevent premature datastep ending.
A proc sql way:
proc sql;
select a.*, b.timestamp-a.timestamp as duration
from have a
left join have b
on a.id=b.id
and a.timestamp < b.timestamp
group by a.id, a.status, a.timestamp
having b.timestamp=min(b.timestamp)
order by id,timestamp;
quit;
Haikuo
If your dataset is already sorted by ID TIMESTAMP, then you can use the following:
data want;
set have;
by id;
if not last.id then do;
i=_N_+1;
set have(rename=(timestamp=nextstamp) keep=nextstamp) point=i;
duration=intck('day', timestamp, nextstamp);
end;
*drop nextstamp i;
run;
you may have to change the keep= set option to keep=timestamp. I always forget if rename applies before or after keep for set options. This effectively reads in the next TIMESTAMP into a new variable NEXTSTAMP using direct addressing of _N_+1 since your data is presorted.
edit the logic is very similar to that of PG
*edit corrected after PGs' comment. That is the kind of syntax restrictions that I always figure out debugging after my logic is established!
Vince
This involves double sorting, but if your data is not too big it might work....:)
data test;
length id 8 status $10 timestamp 8;
input id status timestamp anydtdte.;
format timestamp yymmdd10.;
datalines;
1122 idle 10may2013
1122 active 15may2013
1122 idle 25may2013
1122 closed 01jun2013
;
run;
proc sort data=test;
by id descending timestamp;
run;
data new;
set test;
retain tmp;
format nextts yymmdd10.;
if _n_ = 1 then tmp = .;
nextts = tmp;
if timestamp ne . then tmp = timestamp;
duration=intck('day', timestamp, nextts);
run;
proc sort data=new;
by id timestamp;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.