Hi everyone.
I have a problem about vertical summation by some conditions.
Main dataset is below;
ID | COHORT | ALLOCATE | USAGE | |
1 | 1001 | 201601 | 15.049 | 6.766 |
2 | 1001 | 201602 | 13.507 | 5.088 |
3 | 1001 | 201603 | 11.872 | 9.352 |
4 | 1001 | 201604 | 13.791 | 7.404 |
5 | 1001 | 201605 | 13.618 | 8.866 |
6 | 1001 | 201606 | 16.906 | 8.897 |
7 | 1001 | 201607 | 13.706 | 6.988 |
8 | 1001 | 201608 | 11.543 | 9.294 |
9 | 1001 | 201609 | 12.457 | 7.308 |
10 | 1001 | 201610 | 14.250 | 8.934 |
11 | 1001 | 201611 | 11.148 | 7.726 |
12 | 1001 | 201612 | 19.648 | 6.331 |
Conditions;
If these conditions are met then 1, else 0 in FINAL column.
The table I want to create is below;
| ID | COHORT | ALLOCATE | USAGE | FINAL |
1 | 1001 | 201601 | 15.049 | 6.766 | 1 |
2 | 1001 | 201602 | 13.507 | 5.088 | 1 |
3 | 1001 | 201603 | 11.872 | 9.352 | 1 |
4 | 1001 | 201604 | 13.791 | 7.404 | 1 |
5 | 1001 | 201605 | 0 | 8.866 | 0 |
6 | 1001 | 201606 | 16.906 | 8.897 | 1 |
7 | 1001 | 201607 | 13.706 | 0 | 0 |
8 | 1001 | 201608 | 11.543 | 0 | 0 |
9 | 1001 | 201609 | 12.457 | 0 | 0 |
10 | 1001 | 201610 | 14.250 | 0 | 0 |
11 | 1001 | 201611 | 11.148 | 0 | 0 |
12 | 1001 | 201612 | 19.648 | 0 | 0 |
How can I do that in datastep procedure?
Thanks a lot in advance.
Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)
then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).
Try it and post your code in case of any issue.
@gibsonsg: I have no idea how, given the conditions you describe and your sample input, you arrive at zeros anywhere in the sample output. Before you logic can be translated into a DATA step (or any other program), it must be clearly understood. I admit that I totally fail to understand it.
Paul D.
Hi hasman,
You are right, I have to correct the conditions. I want to check the ID with the following sixth ID.
Thanks for warning.
For first row;
ID1 = ID6 and
Allocate1 > 0 and
(Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0
For second row;
ID2 = ID7 and
Allocate2 > 0 and
(Usage2 + Usage3 + Usage4 + Usage5 + Usage6 + Usage7) > 0
For thirth row;
ID3 = ID8 and
Allocate3 > 0 and
(Usage3 + Usage4 + Usage5 + Usage6 + Usage7 + Usage8) > 0
and so on.
Thank you, it's much clearer. Yet it still doesn't explain your proposed output because all of these conditions are met for every row:
For starters, could you explain what combination of your conditions causes FINAL to become 0 in output row 5?
Paul D.
Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)
then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).
Try it and post your code in case of any issue.
Hi again.
After some research, I found that intnx function is the another handy method to solve the problem.
I hope it would be useful.
Input Table
| ID | COHORT | ALLOCATE | USAGE |
1 | 1001 | 201601 | 15.049 | 6.766 |
2 | 1001 | 201602 | 13.507 | 5.088 |
3 | 1001 | 201603 | 11.872 | 9.352 |
4 | 1001 | 201604 | 13.791 | 7.404 |
5 | 1001 | 201605 | 0 | 8.866 |
6 | 1001 | 201606 | 16.906 | 8.897 |
7 | 1001 | 201607 | 13.706 | 0 |
8 | 1001 | 201608 | 11.543 | 0 |
9 | 1001 | 201609 | 12.457 | 0 |
10 | 1001 | 201610 | 14.250 | 0 |
11 | 1001 | 201611 | 11.148 | 0 |
12 | 1001 | 201612 | 19.648 | 0 |
Codes
For the solution, DATE column was created for each observation in Cohort column.
(201601=31.01.2016, 201602=29.02.2016 etc.)
proc sql;
create table work.vertical_sum as
select
ID,
COHORT,
DATE,
ALLOCATE,
SUM(USAGE) as TOTAL_USAGE
from
(select t1.ID, t1.COHORT, t1.DATE, t1.ALLOCATE, t2.USAGE from work.vertical as t1
left join work.vertical as t2
on (t1.ID=t2.ID
and
t2.DATE between intnx('month', t1.DATE, 0) and intnx('month', t1.DATE, 6)))
group by ID, COHORT
order by ID, COHORT;
create table work.final_sum as
select
t1.ID,
t1.COHORT,
t1.ALLOCATE,
t2.TOTAL_USAGE,
Case When t1.ALLOCATE > 0 and t2.TOTAL_USAGE > 0 Then 1 Else 0 End as FINAL
from work.vertical_sum as t1
inner join work.vertical_sum as t2
on t1.ID=t2.ID and t1.COHORT=t2.COHORT
order by t1.ID, t1.COHORT;
quit;
proc sort data=work.final_sum out=work.final_sum nodupkey;
by cohort;
run;
Result Table
| ID | COHORT | DATE | ALLOCATE | TOTAL_USAGE | FINAL |
1 | 1001 | 201601 | 31.01.2016 | 15.049 | 46.373 | 1 |
2 | 1001 | 201602 | 29.02.2016 | 13.507 | 39.607 | 1 |
3 | 1001 | 201603 | 31.03.2016 | 11.872 | 34.519 | 1 |
4 | 1001 | 201604 | 30.04.2016 | 13.791 | 25.167 | 1 |
5 | 1001 | 201605 | 31.05.2016 | 0 | 17.763 | 0 |
6 | 1001 | 201606 | 30.06.2016 | 16.906 | 8.897 | 1 |
7 | 1001 | 201607 | 31.07.2016 | 13.706 | 0 | 0 |
8 | 1001 | 201608 | 31.08.2016 | 11.543 | 0 | 0 |
9 | 1001 | 201609 | 30.09.2016 | 12.457 | 0 | 0 |
10 | 1001 | 201610 | 31.10.2016 | 14.25 | 0 | 0 |
11 | 1001 | 201611 | 30.11.2016 | 11.148 | 0 | 0 |
12 | 1001 | 201612 | 31.12.2016 | 19.648 | 0 | 0 |
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.