Hi all,
I have table with 3 columns (id;datetime;status) and I need to create new table with periods between two statuses: CLOSE-REOPEN.
Have:
id | datetime | status |
1 | 06SEP2020:12:00:00 | open |
1 | 06SEP2020:12:00:01 | CLOSE |
1 | 06SEP2020:12:00:02 | s1 |
1 | 06SEP2020:12:00:03 | s2 |
1 | 06SEP2020:12:00:04 | REOPEN |
1 | 06SEP2020:12:00:05 | s1 |
1 | 06SEP2020:12:00:06 | CLOSE |
1 | 06SEP2020:12:00:07 | s1 |
1 | 06SEP2020:12:00:08 | REOPEN |
2 | 06SEP2020:12:00:00 | open |
2 | 06SEP2020:12:00:01 | CLOSE |
2 | 06SEP2020:12:00:02 | s1 |
Want:
id | status1 | status2 | datetime_from | datetime_to |
1 | CLOSE | REOPEN | 06SEP2020:12:00:01 | 06SEP2020:12:00:03 |
1 | CLOSE | REOPEN | 06SEP2020:12:00:06 | 06SEP2020:12:00:07 |
Is it possible to achieve this with SQL Query? I can use only PROC SQL. It would be great to do this with one query but if it is not possible then more queries are also fine.
Sure. It can be achieved with a SQL query:
proc sql;
select
a.id,
a.status,
b.status,
a.datetime as datetimeFrom,
b.datetime as datetimeTo
from
have as a inner join
have as b on a.id=b.id and b.datetime>a.datetime
where a.status = "CLOSE" and b.status="REOPEN"
group by a.id, a.datetime
having b.datetime-a.datetime = min(b.datetime-a.datetime);
quit;
@Lucas wrote:
I can use only PROC SQL.
In my opinion, an unfortunate restriction. SQL doesn't really a good tool for finding the values immediately above a specific record. I'm sure someone will come along and write wizard-like SQL code that will be difficult to understand, that will get the job done.
I would use a DATA step, followed by PROC TRANSPOSE. So for anyone reading along who might not have this restriction, here is the code:
data times;
set have;
prev_time=lag(datetime);
if status='CLOSE' then do;
sequence+1;
output_time=datetime;
output;
end;
else if status='REOPEN' then do;
output_time=prev_time;
output;
end;
drop prev_time datetime;
run;
proc transpose data=times out=want prefix=datetime_;
var output_time;
by id sequence;
id status;
run;
data want;
set want(drop=sequence _name_);
if missing(datetime_reopen) then delete;
format datetime: datetime16.;
run;
Sure. It can be achieved with a SQL query:
proc sql;
select
a.id,
a.status,
b.status,
a.datetime as datetimeFrom,
b.datetime as datetimeTo
from
have as a inner join
have as b on a.id=b.id and b.datetime>a.datetime
where a.status = "CLOSE" and b.status="REOPEN"
group by a.id, a.datetime
having b.datetime-a.datetime = min(b.datetime-a.datetime);
quit;
This doesn't give the requested output in the column named datetimeTo
Oops. You are right @PaigeMiller.
But then, what should be the value of datetimeTo? The most recent recorded timestamp before REOPEN (whatever the status) or the REOPEN timestamp minus 1 second? What happens if CLOSE and REOPEN are consecutive?
Those are questions only @Lucas can answer.
This illustrates the importance of coming up with a realistic example to illustrate the problem.
Yes, it can be done with SQL, but AFAICT SQL has to do a cartesian crossing of all CLOSE vs REOPEN rows to determine the reopen most closely following close. It doesn't appear to take advantage of the fact that the data are ordered - ready made for a DATA step, which should be much faster.
The user is being asked to show proficiency in SQL rather than efficiency (and relative simplicity) in data processing.
If this is an assignment to help someone become proficient at SQL, then I hate the idea behind this assignment. Part of becoming proficient at SQL is understanding when SQL is not a good tool to use.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.