Solved: Re: time period determination using PROC SQL

Lucas · Posted 09-06-2020 09:05 AM

Hi all,

I have table with 3 columns (id;datetime;status) and I need to create new table with periods between two statuses: CLOSE-REOPEN.

Have:

id	datetime	status
1	06SEP2020:12:00:00	open
1	06SEP2020:12:00:01	CLOSE
1	06SEP2020:12:00:02	s1
1	06SEP2020:12:00:03	s2
1	06SEP2020:12:00:04	REOPEN
1	06SEP2020:12:00:05	s1
1	06SEP2020:12:00:06	CLOSE
1	06SEP2020:12:00:07	s1
1	06SEP2020:12:00:08	REOPEN
2	06SEP2020:12:00:00	open
2	06SEP2020:12:00:01	CLOSE
2	06SEP2020:12:00:02	s1

Want:

id	status1	status2	datetime_from	datetime_to
1	CLOSE	REOPEN	06SEP2020:12:00:01	06SEP2020:12:00:03
1	CLOSE	REOPEN	06SEP2020:12:00:06	06SEP2020:12:00:07

Is it possible to achieve this with SQL Query? I can use only PROC SQL. It would be great to do this with one query but if it is not possible then more queries are also fine.

PGStats · Posted 09-06-2020 12:47 PM

Sure. It can be achieved with a SQL query:

proc sql;
select 
    a.id,
    a.status,
    b.status,
    a.datetime as datetimeFrom,
    b.datetime as datetimeTo
from 
    have as a inner join
    have as b on a.id=b.id and b.datetime>a.datetime
where a.status = "CLOSE" and b.status="REOPEN"
group by a.id, a.datetime
having b.datetime-a.datetime = min(b.datetime-a.datetime);
quit;

PG

View solution in original post

PaigeMiller · Posted 09-06-2020 09:25 AM

@Lucas wrote:

I can use only PROC SQL.

In my opinion, an unfortunate restriction. SQL doesn't really a good tool for finding the values immediately above a specific record. I'm sure someone will come along and write wizard-like SQL code that will be difficult to understand, that will get the job done.

I would use a DATA step, followed by PROC TRANSPOSE. So for anyone reading along who might not have this restriction, here is the code:

data times;
    set have;
    prev_time=lag(datetime);
    if status='CLOSE' then do;
    	sequence+1;
    	output_time=datetime;
    	output;
	end;
	else if status='REOPEN' then do;
	    output_time=prev_time;
	    output;
    end;
    drop prev_time datetime;
run;
proc transpose data=times out=want prefix=datetime_;
    var output_time;
    by id sequence;
    id status;
run;
data want;
    set want(drop=sequence _name_);
    if missing(datetime_reopen) then delete;
    format datetime: datetime16.;
run;

--
Paige Miller

PGStats · Posted 09-06-2020 12:47 PM

Sure. It can be achieved with a SQL query:

proc sql;
select 
    a.id,
    a.status,
    b.status,
    a.datetime as datetimeFrom,
    b.datetime as datetimeTo
from 
    have as a inner join
    have as b on a.id=b.id and b.datetime>a.datetime
where a.status = "CLOSE" and b.status="REOPEN"
group by a.id, a.datetime
having b.datetime-a.datetime = min(b.datetime-a.datetime);
quit;

PG

PaigeMiller · Posted 09-06-2020 01:11 PM

This doesn't give the requested output in the column named datetimeTo

--
Paige Miller

PGStats · Posted 09-06-2020 02:03 PM

Oops. You are right @PaigeMiller.

But then, what should be the value of datetimeTo? The most recent recorded timestamp before REOPEN (whatever the status) or the REOPEN timestamp minus 1 second? What happens if CLOSE and REOPEN are consecutive?

PG

PaigeMiller · Posted 09-06-2020 02:34 PM

Those are questions only @Lucas can answer.

This illustrates the importance of coming up with a realistic example to illustrate the problem.

--
Paige Miller

mkeintz · Posted 09-06-2020 05:07 PM

Yes, it can be done with SQL, but AFAICT SQL has to do a cartesian crossing of all CLOSE vs REOPEN rows to determine the reopen most closely following close. It doesn't appear to take advantage of the fact that the data are ordered - ready made for a DATA step, which should be much faster.

The user is being asked to show proficiency in SQL rather than efficiency (and relative simplicity) in data processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PaigeMiller · Posted 09-06-2020 05:58 PM

If this is an assignment to help someone become proficient at SQL, then I hate the idea behind this assignment. Part of becoming proficient at SQL is understanding when SQL is not a good tool to use.

--
Paige Miller

Lucas · Posted 09-07-2020 06:12 AM

Briliant solution in easy way. Values datetimeTo are different than in my example but it's fine for me. Thank you very much @PGStats!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away