Hi all,
Here is my data. Data A has ID with different lines with start and end dates.
ID | LINE | LOSTART | LOEND |
1 | 4 | 11/27/2017 | 1/17/2018 |
1 | 1 | 12/1/2015 | 4/1/2016 |
1 | 2 | 4/5/2016 | 8/16/2016 |
1 | 3 | 8/23/2016 | 11/14/2016 |
2 | 1 | 6/4/2016 | 9/18/2016 |
2 | 2 | 9/19/2016 | 11/11/2016 |
3 | 2 | 5/6/2016 | 7/17/2016 |
3 | 3 | 7/18/2016 | 10/3/2016 |
4 | 1 | 11/2/2015 | 5/30/2016 |
4 | 2 | 5/31/2016 | 12/28/2016 |
4 | 3 | 4/8/2018 | 1/9/2019 |
5 | 1 | 4/3/2015 | 6/4/2015 |
5 | 2 | 6/5/2015 | 12/27/2015 |
5 | 3 | 8/5/2016 | 9/7/2016 |
6 | 1 | 10/12/2015 | 6/2/2016 |
6 | 4 | 11/7/2016 | 4/20/2017 |
6 | 2 | 6/3/2016 | 8/27/2016 |
7 | 1 | 6/5/2016 | 6/15/2016 |
Here is data B:
ID | Order |
1 | 7/16/2016 |
1 | 10/2/2017 |
2 | 7/18/2016 |
3 | 7/18/2016 |
4 | 7/26/2016 |
5 | 7/31/2016 |
6 | 7/31/2016 |
7 | 5/10/2016 |
The end result table is below. I am looking for the line by the order date.
The rule for the line variable is as follows:
Line = the order date is in between LOstart and LOend; This is pretty straight forward. For example: ID#1 row 1: Line 2 falls between LOstart and LOend. i.e. 7/16/2016 falls between 4/5/2016 and 8/16/2016
When order date is not in between LOstart and LOend, then Line = before the start of next line. For example: ID#1 row 2: order date 10/2/2017 is considered line 3 because it occurs before the beginning of line 4 LOstart date of 11/27/2018.
Similarly with ID#5 order date is 7/31/2016 is considered line 2 because it occurs before the beginning of line 3 LOstart date of 8/5/2016
Finally for only 1 order and if this order occurs before LINE 1 then it is considered 0. For Example: ID#7 row 8: Line is considered 0 because the order date of 5/10/2016 occurs before LOSTART date of 6/5/2016.
I tried using SQL and only could get partial results.
Row | ID | Order | LINE |
1 | 1 | 7/16/2016 | 2 |
2 | 1 | 10/2/2017 | 3 |
3 | 2 | 7/18/2016 | 1 |
4 | 3 | 7/18/2016 | 3 |
5 | 4 | 7/26/2016 | 2 |
6 | 5 | 7/31/2016 | 2 |
7 | 6 | 7/31/2016 | 2 |
8 | 7 | 5/10/2016 | 0 |
Thanks a ton!
Okay @rajd1 Keeping it simple-
data a;
input ID LINE (LOSTART LOEND) (:mmddyy10.);
format LOSTART LOEND mmddyy10.;
cards;
1 4 11/27/2017 1/17/2018
1 1 12/1/2015 4/1/2016
1 2 4/5/2016 8/16/2016
1 3 8/23/2016 11/14/2016
2 1 6/4/2016 9/18/2016
2 2 9/19/2016 11/11/2016
3 2 5/6/2016 7/17/2016
3 3 7/18/2016 10/3/2016
4 1 11/2/2015 5/30/2016
4 2 5/31/2016 12/28/2016
4 3 4/8/2018 1/9/2019
5 1 4/3/2015 6/4/2015
5 2 6/5/2015 12/27/2015
5 3 8/5/2016 9/7/2016
6 1 10/12/2015 6/2/2016
6 4 11/7/2016 4/20/2017
6 2 6/3/2016 8/27/2016
7 1 6/5/2016 6/15/2016
;
data b;
input ID Order :mmddyy10.;
format order mmddyy10.;
cards;
1 7/16/2016
1 10/2/2017
2 7/18/2016
3 7/18/2016
4 7/26/2016
5 7/31/2016
6 7/31/2016
7 5/10/2016
;
proc sql;
create table temp(drop=_True_Flag) as
select b.id,order,max(LOSTART<=order<=LOEND) as _True_Flag,line,
min(ifn(lostart>order,line,.))-1 as _line
from b b left join a a
on a.id=b.id
group by a.id,order
having _True_Flag and LOSTART<=order<=LOEND
or (not _True_Flag and _line=line)
or (line=1 and _line=0 and not _True_Flag)
order by a.id,order,line;
quit;
data want;
set temp;
if line=1 and _line=0 then line=_line;
drop _line;
run;
ID Order LINE
1 07/16/2016 2
1 10/02/2017 3
2 07/18/2016 1
3 07/18/2016 3
4 07/26/2016 2
5 07/31/2016 2
6 07/31/2016 2
7 05/10/2016 0
Finally for only 1 order and if this order occurs before LINE 1 then it is considered 0. For Example: ID#7 row 8: Line is considered 0 because the order date of 5/10/2016 occurs before LOSTART date of 6/5/2016?where is this in Data B?
Hi @rajd1 Alright no worries. Please try and test properly as I didn't for the reason I'm extremely tired and am about to doze off. If it doesnt work, Iet me know. I shall try and fix it tomoorow. Oh hey, in any case somebody else would have answered before I wake. Good night from CT
data a;
input ID LINE (LOSTART LOEND) (:mmddyy10.);
format LOSTART LOEND mmddyy10.;
cards;
1 4 11/27/2017 1/17/2018
1 1 12/1/2015 4/1/2016
1 2 4/5/2016 8/16/2016
1 3 8/23/2016 11/14/2016
2 1 6/4/2016 9/18/2016
2 2 9/19/2016 11/11/2016
3 2 5/6/2016 7/17/2016
3 3 7/18/2016 10/3/2016
4 1 11/2/2015 5/30/2016
4 2 5/31/2016 12/28/2016
4 3 4/8/2018 1/9/2019
5 1 4/3/2015 6/4/2015
5 2 6/5/2015 12/27/2015
5 3 8/5/2016 9/7/2016
6 1 10/12/2015 6/2/2016
6 4 11/7/2016 4/20/2017
6 2 6/3/2016 8/27/2016
7 1 6/5/2016 6/15/2016
;
data b;
input ID Order :mmddyy10.;
format order mmddyy10.;
cards;
1 7/16/2016
1 10/2/2017
2 7/18/2016
3 7/18/2016
4 7/26/2016
5 7/31/2016
6 7/31/2016
7 5/10/2016
;
proc sql;
create table want as
select b.* , ifn(max(line)=1 and order<lostart,0,line) as line
from b b left join a a
on a.id=b.id and (LOSTART<=order<=LOEND or LOSTART>order)
group by a.id,order
having LOSTART<=order<=LOEND or
^max(LOSTART<=order<=LOEND) and min(lostart)=lostart
order by a.id,order;
quit;
Hi @novinosrin , Thanks for this. Just a subtle change that i was looking for:
When order date is not in between LOstart and LOend, then Line = before the start of next line. For example: ID#1 row 2: order date 10/2/2017 is considered line 3 because it occurs before the beginning of line 4 LOstart date of 11/27/2018.
Similarly with ID#5 order date is 7/31/2016 is considered line 2 because it occurs before the beginning of line 3 LOstart date of 8/5/2016
Hi @rajd1 Good afternoon. I guess there is some problem in the communication or perhaps I am just lacking the comprehension skill. Let's do in piecemeal-
1. Once you JOIN/LOOK UP using self join, the sql processor see's this in-memory. The true flag =1 denotes the order falls between lostart and loend. So for these ID,Order combinations, it's easy to filter i.e. where true_flag=1. And for those cases, where it isn't true, your statement line= before the start of next line- is kinda confusing. Please explain the comparsion for ID1 10/02/2017 from the below
ID | Order | LINE | True_Flag | LOSTART | LOEND |
---|---|---|---|---|---|
1 | 07/16/2016 | 1 | 0 | 12/01/2015 | 04/01/2016 |
1 | 07/16/2016 | 2 | 1 | 04/05/2016 | 08/16/2016 |
1 | 07/16/2016 | 3 | 0 | 08/23/2016 | 11/14/2016 |
1 | 07/16/2016 | 4 | 0 | 11/27/2017 | 01/17/2018 |
1 | 10/02/2017 | 1 | 0 | 12/01/2015 | 04/01/2016 |
1 | 10/02/2017 | 2 | 0 | 04/05/2016 | 08/16/2016 |
1 | 10/02/2017 | 3 | 0 | 08/23/2016 | 11/14/2016 |
1 | 10/02/2017 | 4 | 0 | 11/27/2017 | 01/17/2018 |
2 | 07/18/2016 | 1 | 1 | 06/04/2016 | 09/18/2016 |
2 | 07/18/2016 | 2 | 0 | 09/19/2016 | 11/11/2016 |
3 | 07/18/2016 | 2 | 0 | 05/06/2016 | 07/17/2016 |
3 | 07/18/2016 | 3 | 1 | 07/18/2016 | 10/03/2016 |
4 | 07/26/2016 | 1 | 0 | 11/02/2015 | 05/30/2016 |
4 | 07/26/2016 | 2 | 1 | 05/31/2016 | 12/28/2016 |
4 | 07/26/2016 | 3 | 0 | 04/08/2018 | 01/09/2019 |
5 | 07/31/2016 | 1 | 0 | 04/03/2015 | 06/04/2015 |
5 | 07/31/2016 | 2 | 0 | 06/05/2015 | 12/27/2015 |
5 | 07/31/2016 | 3 | 0 | 08/05/2016 | 09/07/2016 |
6 | 07/31/2016 | 1 | 0 | 10/12/2015 | 06/02/2016 |
6 | 07/31/2016 | 2 | 1 | 06/03/2016 | 08/27/2016 |
6 | 07/31/2016 | 4 | 0 | 11/07/2016 | 04/20/2017 |
7 | 05/10/2016 | 1 | 0 | 06/05/2016 | 06/15/2016 |
Good afternoon,
This is great! Hope i explain this better this time. The first rule between dates is good
So, say for ID#1 orderdate= 10/02/2017 will be Line = 3; It falls between line 3 LOEND 11/14/2016 and line 4 LOSTART 11/27/2017 and is still considered line 3.
Again, for ID#5, orderdate = 7/31/2016 will be line 2; it falls between line 2 LOEND 12/27/2015 and line 3 LOSTART 08.05/2016 and is still considered line 2.
You had this in the solution and it works great: If order occurs before LINE 1 then it is considered 0. For Example: ID#7 row 8: Line is considered 0 because the order date of 5/10/2016 occurs before LOSTART date of 6/5/2016.
I have the final solution below.
ID | Order | LINE |
1 | 7/16/2016 | 2 |
1 | 10/2/2017 | 3 |
2 | 7/18/2016 | 1 |
3 | 7/18/2016 | 3 |
4 | 7/26/2016 | 2 |
5 | 7/31/2016 | 2 |
6 | 7/31/2016 | 2 |
7 | 5/10/2016 | 0 |
Much better Sir @rajd1 . lol. Thank you!!!. Hmm, you "have"-'I have the final solution below.'
Do you still want the solution or you have done writing it?
@novinosrin Can I please get your modified code? Thank you sir 🙂
Okay @rajd1 Keeping it simple-
data a;
input ID LINE (LOSTART LOEND) (:mmddyy10.);
format LOSTART LOEND mmddyy10.;
cards;
1 4 11/27/2017 1/17/2018
1 1 12/1/2015 4/1/2016
1 2 4/5/2016 8/16/2016
1 3 8/23/2016 11/14/2016
2 1 6/4/2016 9/18/2016
2 2 9/19/2016 11/11/2016
3 2 5/6/2016 7/17/2016
3 3 7/18/2016 10/3/2016
4 1 11/2/2015 5/30/2016
4 2 5/31/2016 12/28/2016
4 3 4/8/2018 1/9/2019
5 1 4/3/2015 6/4/2015
5 2 6/5/2015 12/27/2015
5 3 8/5/2016 9/7/2016
6 1 10/12/2015 6/2/2016
6 4 11/7/2016 4/20/2017
6 2 6/3/2016 8/27/2016
7 1 6/5/2016 6/15/2016
;
data b;
input ID Order :mmddyy10.;
format order mmddyy10.;
cards;
1 7/16/2016
1 10/2/2017
2 7/18/2016
3 7/18/2016
4 7/26/2016
5 7/31/2016
6 7/31/2016
7 5/10/2016
;
proc sql;
create table temp(drop=_True_Flag) as
select b.id,order,max(LOSTART<=order<=LOEND) as _True_Flag,line,
min(ifn(lostart>order,line,.))-1 as _line
from b b left join a a
on a.id=b.id
group by a.id,order
having _True_Flag and LOSTART<=order<=LOEND
or (not _True_Flag and _line=line)
or (line=1 and _line=0 and not _True_Flag)
order by a.id,order,line;
quit;
data want;
set temp;
if line=1 and _line=0 then line=_line;
drop _line;
run;
ID Order LINE
1 07/16/2016 2
1 10/02/2017 3
2 07/18/2016 1
3 07/18/2016 3
4 07/26/2016 2
5 07/31/2016 2
6 07/31/2016 2
7 05/10/2016 0
The rule for ID#5 as well as ID#1 row2, so is the line of ID#5 line 3?I am not sure that I understood what you said. Here is my code.
proc sort data=a ;
by id lostart;
run;
data a1;
set a;
by id lostart;
lostart1=lag(loend)+1;
if first.id then lostart1=lostart;
format lagstdat mmddyy10.;
run;
proc sql;
create table b1 as
select b.id,b.order
,case when not missing(a.line) then a.line
else 0
end as line
from b
left join a1 as a
on b.id=a.id and b.order between a.lostart1 and a.loend
order by b.id,b.order
;
quit;
According to your rule, the result for ID 5 should be 3, not 2. Please revise your intended result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.