Hi everyone.
I have a problem about vertical summation by some conditions.
Main dataset is below;
ID | COHORT | ALLOCATE | USAGE | |
1 | 1001 | 201601 | 15.049 | 6.766 |
2 | 1001 | 201602 | 13.507 | 5.088 |
3 | 1001 | 201603 | 11.872 | 9.352 |
4 | 1001 | 201604 | 13.791 | 7.404 |
5 | 1001 | 201605 | 13.618 | 8.866 |
6 | 1001 | 201606 | 16.906 | 8.897 |
7 | 1001 | 201607 | 13.706 | 6.988 |
8 | 1001 | 201608 | 11.543 | 9.294 |
9 | 1001 | 201609 | 12.457 | 7.308 |
10 | 1001 | 201610 | 14.250 | 8.934 |
11 | 1001 | 201611 | 11.148 | 7.726 |
12 | 1001 | 201612 | 19.648 | 6.331 |
Conditions;
If these conditions are met then 1, else 0 in FINAL column.
The table I want to create is below;
| ID | COHORT | ALLOCATE | USAGE | FINAL |
1 | 1001 | 201601 | 15.049 | 6.766 | 1 |
2 | 1001 | 201602 | 13.507 | 5.088 | 1 |
3 | 1001 | 201603 | 11.872 | 9.352 | 1 |
4 | 1001 | 201604 | 13.791 | 7.404 | 1 |
5 | 1001 | 201605 | 0 | 8.866 | 0 |
6 | 1001 | 201606 | 16.906 | 8.897 | 1 |
7 | 1001 | 201607 | 13.706 | 0 | 0 |
8 | 1001 | 201608 | 11.543 | 0 | 0 |
9 | 1001 | 201609 | 12.457 | 0 | 0 |
10 | 1001 | 201610 | 14.250 | 0 | 0 |
11 | 1001 | 201611 | 11.148 | 0 | 0 |
12 | 1001 | 201612 | 19.648 | 0 | 0 |
How can I do that in datastep procedure?
Thanks a lot in advance.
Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)
then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).
Try it and post your code in case of any issue.
@gibsonsg: I have no idea how, given the conditions you describe and your sample input, you arrive at zeros anywhere in the sample output. Before you logic can be translated into a DATA step (or any other program), it must be clearly understood. I admit that I totally fail to understand it.
Paul D.
Hi hasman,
You are right, I have to correct the conditions. I want to check the ID with the following sixth ID.
Thanks for warning.
For first row;
ID1 = ID6 and
Allocate1 > 0 and
(Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0
For second row;
ID2 = ID7 and
Allocate2 > 0 and
(Usage2 + Usage3 + Usage4 + Usage5 + Usage6 + Usage7) > 0
For thirth row;
ID3 = ID8 and
Allocate3 > 0 and
(Usage3 + Usage4 + Usage5 + Usage6 + Usage7 + Usage8) > 0
and so on.
Thank you, it's much clearer. Yet it still doesn't explain your proposed output because all of these conditions are met for every row:
For starters, could you explain what combination of your conditions causes FINAL to become 0 in output row 5?
Paul D.
Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)
then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).
Try it and post your code in case of any issue.
Hi again.
After some research, I found that intnx function is the another handy method to solve the problem.
I hope it would be useful.
Input Table
| ID | COHORT | ALLOCATE | USAGE |
1 | 1001 | 201601 | 15.049 | 6.766 |
2 | 1001 | 201602 | 13.507 | 5.088 |
3 | 1001 | 201603 | 11.872 | 9.352 |
4 | 1001 | 201604 | 13.791 | 7.404 |
5 | 1001 | 201605 | 0 | 8.866 |
6 | 1001 | 201606 | 16.906 | 8.897 |
7 | 1001 | 201607 | 13.706 | 0 |
8 | 1001 | 201608 | 11.543 | 0 |
9 | 1001 | 201609 | 12.457 | 0 |
10 | 1001 | 201610 | 14.250 | 0 |
11 | 1001 | 201611 | 11.148 | 0 |
12 | 1001 | 201612 | 19.648 | 0 |
Codes
For the solution, DATE column was created for each observation in Cohort column.
(201601=31.01.2016, 201602=29.02.2016 etc.)
proc sql;
create table work.vertical_sum as
select
ID,
COHORT,
DATE,
ALLOCATE,
SUM(USAGE) as TOTAL_USAGE
from
(select t1.ID, t1.COHORT, t1.DATE, t1.ALLOCATE, t2.USAGE from work.vertical as t1
left join work.vertical as t2
on (t1.ID=t2.ID
and
t2.DATE between intnx('month', t1.DATE, 0) and intnx('month', t1.DATE, 6)))
group by ID, COHORT
order by ID, COHORT;
create table work.final_sum as
select
t1.ID,
t1.COHORT,
t1.ALLOCATE,
t2.TOTAL_USAGE,
Case When t1.ALLOCATE > 0 and t2.TOTAL_USAGE > 0 Then 1 Else 0 End as FINAL
from work.vertical_sum as t1
inner join work.vertical_sum as t2
on t1.ID=t2.ID and t1.COHORT=t2.COHORT
order by t1.ID, t1.COHORT;
quit;
proc sort data=work.final_sum out=work.final_sum nodupkey;
by cohort;
run;
Result Table
| ID | COHORT | DATE | ALLOCATE | TOTAL_USAGE | FINAL |
1 | 1001 | 201601 | 31.01.2016 | 15.049 | 46.373 | 1 |
2 | 1001 | 201602 | 29.02.2016 | 13.507 | 39.607 | 1 |
3 | 1001 | 201603 | 31.03.2016 | 11.872 | 34.519 | 1 |
4 | 1001 | 201604 | 30.04.2016 | 13.791 | 25.167 | 1 |
5 | 1001 | 201605 | 31.05.2016 | 0 | 17.763 | 0 |
6 | 1001 | 201606 | 30.06.2016 | 16.906 | 8.897 | 1 |
7 | 1001 | 201607 | 31.07.2016 | 13.706 | 0 | 0 |
8 | 1001 | 201608 | 31.08.2016 | 11.543 | 0 | 0 |
9 | 1001 | 201609 | 30.09.2016 | 12.457 | 0 | 0 |
10 | 1001 | 201610 | 31.10.2016 | 14.25 | 0 | 0 |
11 | 1001 | 201611 | 30.11.2016 | 11.148 | 0 | 0 |
12 | 1001 | 201612 | 31.12.2016 | 19.648 | 0 | 0 |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.