Hi all,
Below is the original dataset that I want to split it into 3 different datasets as pass, open, and exception. Could someone kindly show me how to go about doing this? Any help is greatly appreciated.
1. Pass will contain all the ones with status = success and the difference between finished date and started date is equal or less than 1 day for the same client.
2. Open will contain all the ones with status=failed and no success for the same client.
3. Exception will contain all the ones with mix of failed and success status, but the finished date of last success is more than 1 day from the started date of first failure.
See the example output below.
Original dataset:
Client | Status | Started | Finished |
adfs02.ccnb.com | failed | 2/27/2022 | 2/27/2022 |
adfs02.ccnb.com | success | 2/28/2022 | 2/28/2022 |
devcsssql01.ccnb.com | failed | 2/27/2022 | 2/27/2022 |
devcsssql01.ccnb.com | success | 2/28/2022 | 2/28/2022 |
devdbasql01.ccnb.com | failed | 2/14/2022 | 2/14/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdlakesql01.ccnb.com | failed | 2/8/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | success | 2/9/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | failed | 2/9/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | failed | 2/10/2022 | 2/10/2022 |
devdlakesql01.ccnb.com | success | 2/10/2022 | 2/10/2022 |
devdlakesql01.ccnb.com | failed | 2/12/2022 | 2/13/2022 |
devdlakesql01.ccnb.com | failed | 2/13/2022 | 2/13/2022 |
devdlakesql01.ccnb.com | failed | 2/13/2022 | 2/14/2022 |
devdlakesql01.ccnb.com | failed | 2/14/2022 | 2/14/2022 |
devdlakesql01.ccnb.com | success | 2/14/2022 | 2/15/2022 |
devdlakesql01.ccnb.com | failed | 2/28/2022 | 2/28/2022 |
devspdb01.ccnb.com | failed | 2/28/2022 | 3/1/2022 |
prddlakesql01.ccnb.com | failed | 2/28/2022 | 2/28/2022 |
Pass:
Client | Status | Started | Finished |
adfs02.ccnb.com | failed | 2/27/2022 | 2/27/2022 |
adfs02.ccnb.com | success | 2/28/2022 | 2/28/2022 |
devcsssql01.ccnb.com | failed | 2/27/2022 | 2/27/2022 |
devcsssql01.ccnb.com | success | 2/28/2022 | 2/28/2022 |
devdlakesql01.ccnb.com | failed | 2/8/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | success | 2/9/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | failed | 2/9/2022 | 2/9/2022 |
devdlakesql01.ccnb.com | failed | 2/10/2022 | 2/10/2022 |
devdlakesql01.ccnb.com | success | 2/10/2022 | 2/10/2022 |
Open:
Client | Status | Started | Finished |
devdbasql01.ccnb.com | failed | 2/14/2022 | 2/14/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdbasql01.ccnb.com | failed | 2/15/2022 | 2/15/2022 |
devdlakesql01.ccnb.com | failed | 2/28/2022 | 2/28/2022 |
devspdb01.ccnb.com | failed | 2/28/2022 | 3/1/2022 |
prddlakesql01.ccnb.com | failed | 2/28/2022 | 2/28/2022 |
Exception:
Client | Status | Started | Finished |
devdlakesql01.ccnb.com | failed | 2/12/2022 | 2/13/2022 |
devdlakesql01.ccnb.com | failed | 2/13/2022 | 2/13/2022 |
devdlakesql01.ccnb.com | failed | 2/13/2022 | 2/14/2022 |
devdlakesql01.ccnb.com | failed | 2/14/2022 | 2/14/2022 |
devdlakesql01.ccnb.com | success | 2/14/2022 | 2/15/2022 |
It seems like you are looking at the rows in groups of all the failures for a certain client up until one success. I was able to accomplish your desired output by adding a group variable and then using the having clause in proc sql. I also had to add a sort order variable in order to keep the rows in the original order. Let me know if this is what you are looking for, or if I misunderstand.
data original;
length client $50 status $10;
input Client $ Status $ Started Finished;
informat started finished mmddyy10.;
format started finished mmddyy10.;
datalines;
adfs02.ccnb.com failed 2/27/2022 2/27/2022
adfs02.ccnb.com success 2/28/2022 2/28/2022
devcsssql01.ccnb.com failed 2/27/2022 2/27/2022
devcsssql01.ccnb.com success 2/28/2022 2/28/2022
devdbasql01.ccnb.com failed 2/14/2022 2/14/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdlakesql01.ccnb.com failed 2/8/2022 2/9/2022
devdlakesql01.ccnb.com success 2/9/2022 2/9/2022
devdlakesql01.ccnb.com failed 2/9/2022 2/9/2022
devdlakesql01.ccnb.com failed 2/10/2022 2/10/2022
devdlakesql01.ccnb.com success 2/10/2022 2/10/2022
devdlakesql01.ccnb.com failed 2/12/2022 2/13/2022
devdlakesql01.ccnb.com failed 2/13/2022 2/13/2022
devdlakesql01.ccnb.com failed 2/13/2022 2/14/2022
devdlakesql01.ccnb.com failed 2/14/2022 2/14/2022
devdlakesql01.ccnb.com success 2/14/2022 2/15/2022
devdlakesql01.ccnb.com failed 2/28/2022 2/28/2022
devspdb01.ccnb.com failed 2/28/2022 3/1/2022
prddlakesql01.ccnb.com failed 2/28/2022 2/28/2022
;
run;
data grouped;
set original;
by client;
sortorder = _n_;
if first.client then group = 1;
output;
if status = "success" then group + 1;
run;
proc sql;
create table pass as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") > 0 and max(finished) - min(started) <= 1
order by sortorder
;
create table open as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") = 0
order by sortorder
;
create table exception as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") > 0 and max(finished) - min(started) > 1
order by sortorder
;
run;
It seems like you are looking at the rows in groups of all the failures for a certain client up until one success. I was able to accomplish your desired output by adding a group variable and then using the having clause in proc sql. I also had to add a sort order variable in order to keep the rows in the original order. Let me know if this is what you are looking for, or if I misunderstand.
data original;
length client $50 status $10;
input Client $ Status $ Started Finished;
informat started finished mmddyy10.;
format started finished mmddyy10.;
datalines;
adfs02.ccnb.com failed 2/27/2022 2/27/2022
adfs02.ccnb.com success 2/28/2022 2/28/2022
devcsssql01.ccnb.com failed 2/27/2022 2/27/2022
devcsssql01.ccnb.com success 2/28/2022 2/28/2022
devdbasql01.ccnb.com failed 2/14/2022 2/14/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdbasql01.ccnb.com failed 2/15/2022 2/15/2022
devdlakesql01.ccnb.com failed 2/8/2022 2/9/2022
devdlakesql01.ccnb.com success 2/9/2022 2/9/2022
devdlakesql01.ccnb.com failed 2/9/2022 2/9/2022
devdlakesql01.ccnb.com failed 2/10/2022 2/10/2022
devdlakesql01.ccnb.com success 2/10/2022 2/10/2022
devdlakesql01.ccnb.com failed 2/12/2022 2/13/2022
devdlakesql01.ccnb.com failed 2/13/2022 2/13/2022
devdlakesql01.ccnb.com failed 2/13/2022 2/14/2022
devdlakesql01.ccnb.com failed 2/14/2022 2/14/2022
devdlakesql01.ccnb.com success 2/14/2022 2/15/2022
devdlakesql01.ccnb.com failed 2/28/2022 2/28/2022
devspdb01.ccnb.com failed 2/28/2022 3/1/2022
prddlakesql01.ccnb.com failed 2/28/2022 2/28/2022
;
run;
data grouped;
set original;
by client;
sortorder = _n_;
if first.client then group = 1;
output;
if status = "success" then group + 1;
run;
proc sql;
create table pass as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") > 0 and max(finished) - min(started) <= 1
order by sortorder
;
create table open as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") = 0
order by sortorder
;
create table exception as
select client, status, started, finished
from grouped
group by client, group
having sum(status = "success") > 0 and max(finished) - min(started) > 1
order by sortorder
;
run;
Thank you for the response. I think we are almost there. So, what I want to accomplish, and the challenge is to analyze the result of each row as well as the results of the group.
For "Pass" table, it would be
1) If the difference between the finished date of the current "success" and the finished date of the last "success" was less or equal to 1 day. Then
all the preceding failures as well as the current success would be dumped into "Pass" table.
Would it be something like this?
a) If current client <> previous client and current statis = success and current Finished - current Started <= 1
b) If current client = previous client and current statis = success and previous status = success and current Finished - current Started <= 1
For "Open" table, it would be
1) If there was no "success" in the group. I think we got this.
For "Exception" table, it would be
1) If the difference between the finished date of the current "success" and the finished date of the last "success" was greater than 1 day. Then all the preceding failures (could be more than one) as well as the current success would be dumped into "Exception" table.
Hi @NewbieTom,
I'm not sure I totally understand you. You say that you want to analyze each row as well as the whole group. In that case, you could add the group condition to the data as a new column and then filter by it later, something like this:
proc sql;
create table pass1 as
select client, status, started, finished,
sum(status = "success") > 0 and max(finished) - min(started) <= 1 as pass
from grouped
group by client, group
order by sortorder
;
quit;
data pass;
set pass1;
where pass /* and other criteria */;
drop pass;
run;
I'm not sure how that connects to the rest of your post.
For "Pass" table, it would be
1) If the difference between the finished date of the current "success" and the finished date of the last "success" was less or equal to 1 day. Then
all the preceding failures as well as the current success would be dumped into "Pass" table. Do you mean to say that instead of comparing the finished date of the current success and the start date of the first of the preceding series of consecutive failures, you want to compare the finished date of the current success to the finished date of the preceding success? Is that true even if the previous success was for a different client?
Would it be something like this?
a) If current client <> previous client and current statis = success and current Finished - current Started <= 1 Is this referring to the case where the first row for a certain client is a success? If so, that would be handled by the code I shared. (There would only be one record in the group, so max(finished) would be the finished date of that one success and min(started) would be the started date for the same record.
b) If current client = previous client and current statis = success and previous status = success and current Finished - current Started <= 1 Is this referring to the case of two successes in a row for the same client? If so, the same is true that I said for a).
For "Open" table, it would be
1) If there was no "success" in the group. I think we got this. Great!
For "Exception" table, it would be
1) If the difference between the finished date of the current "success" and the finished date of the last "success" was greater than 1 day. Then all the preceding failures (could be more than one) as well as the current success would be dumped into "Exception" table. This is the same as what I did in my program, except use finished date of previous success instead of start date of first consecutive failure. Correct?
I think it can be done in a single data step, like this:
data pass open exception;
do until(last.client or status='success');
set original;
by client;
if first.client then
date0=started;
end;
if status='failed' then
type='O';
else if finished-date0>1 then
type='E';
else
type='P';
date0=finished;
retain date0;
do until(last.client or status='success');
set original;
by client;
select(type);
when('O') output open;
when('P') output pass;
otherwise output exception;
end;
end;
drop date0 type;
run;
Hi. A possible additional option is as follows. I put the original data in sasuser.original.
Data Success Failed Exception;
set sasuser.Original;
If upcase(status)="SUCCESS" and started-finished=0 then output Success;
Else If upcase(status)="FAILED" and started-finished=0 then output Failed;
Else output Exception;
run;
proc print data=sasuser.original;
title "Original Table";
run;
proc print data=Work.Success;
title "Success Table";
run;
proc print data=Work.Failed;
title "Failled Table";
run;
proc print data=Work.Exception;
title "Exception Table";
run;
Original Table |
Obs | Status | Started | Finished | |
---|---|---|---|---|
1 | adfs02.ccnb.com | failed | 02/27/2022 | 02/27/2022 |
2 | adfs02.ccnb.com | success | 02/28/2022 | 02/28/2022 |
3 | devcsssql01.ccnb.com | failed | 02/27/2022 | 02/27/2022 |
4 | devcsssql01.ccnb.com | success | 02/28/2022 | 02/28/2022 |
5 | devdbasql01.ccnb.com | failed | 02/14/2022 | 02/14/2022 |
6 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
7 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
8 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
9 | devdlakesql01.ccnb.com | failed | 02/08/2022 | 02/09/2022 |
10 | devdlakesql01.ccnb.com | success | 02/09/2022 | 02/09/2022 |
11 | devdlakesql01.ccnb.com | failed | 02/09/2022 | 02/09/2022 |
12 | devdlakesql01.ccnb.com | failed | 02/10/2022 | 02/10/2022 |
13 | devdlakesql01.ccnb.com | success | 02/10/2022 | 02/10/2022 |
14 | devdlakesql01.ccnb.com | failed | 02/12/2022 | 02/13/2022 |
15 | devdlakesql01.ccnb.com | failed | 02/13/2022 | 02/13/2022 |
16 | devdlakesql01.ccnb.com | failed | 02/13/2022 | 02/14/2022 |
17 | devdlakesql01.ccnb.com | failed | 02/14/2022 | 02/14/2022 |
18 | devdlakesql01.ccnb.com | success | 02/14/2022 | 02/15/2022 |
19 | devdlakesql01.ccnb.com | failed | 02/28/2022 | 02/28/2022 |
20 | devspdb01.ccnb.com | failed | 02/28/2022 | 03/01/2022 |
21 | prddlakesql01.ccnb.com | failed | 02/28/2022 | 02/28/2022 |
Success Table |
Obs | Status | Started | Finished | |
---|---|---|---|---|
1 | adfs02.ccnb.com | success | 02/28/2022 | 02/28/2022 |
2 | devcsssql01.ccnb.com | success | 02/28/2022 | 02/28/2022 |
3 | devdlakesql01.ccnb.com | success | 02/09/2022 | 02/09/2022 |
4 | devdlakesql01.ccnb.com | success | 02/10/2022 | 02/10/2022 |
Failled Table |
Obs | Status | Started | Finished | |
---|---|---|---|---|
1 | adfs02.ccnb.com | failed | 02/27/2022 | 02/27/2022 |
2 | devcsssql01.ccnb.com | failed | 02/27/2022 | 02/27/2022 |
3 | devdbasql01.ccnb.com | failed | 02/14/2022 | 02/14/2022 |
4 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
5 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
6 | devdbasql01.ccnb.com | failed | 02/15/2022 | 02/15/2022 |
7 | devdlakesql01.ccnb.com | failed | 02/09/2022 | 02/09/2022 |
8 | devdlakesql01.ccnb.com | failed | 02/10/2022 | 02/10/2022 |
9 | devdlakesql01.ccnb.com | failed | 02/13/2022 | 02/13/2022 |
10 | devdlakesql01.ccnb.com | failed | 02/14/2022 | 02/14/2022 |
11 | devdlakesql01.ccnb.com | failed | 02/28/2022 | 02/28/2022 |
12 | prddlakesql01.ccnb.com | failed | 02/28/2022 | 02/28/2022 |
Exception Table |
Obs | Status | Started | Finished | |
---|---|---|---|---|
1 | devdlakesql01.ccnb.com | failed | 02/08/2022 | 02/09/2022 |
2 | devdlakesql01.ccnb.com | failed | 02/12/2022 | 02/13/2022 |
3 | devdlakesql01.ccnb.com | failed | 02/13/2022 | 02/14/2022 |
4 | devdlakesql01.ccnb.com | success | 02/14/2022 | 02/15/2022 |
5 | devspdb01.ccnb.com | failed | 02/28/2022 | 03/01/2022 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.