Solved: Re: Vertical Summation by Condition

gibsonsg · Posted 08-12-2018 11:18 AM

Hi everyone.

I have a problem about vertical summation by some conditions.

Main dataset is below;

	ID	COHORT	ALLOCATE	USAGE
1	1001	201601	15.049	6.766
2	1001	201602	13.507	5.088
3	1001	201603	11.872	9.352
4	1001	201604	13.791	7.404
5	1001	201605	13.618	8.866
6	1001	201606	16.906	8.897
7	1001	201607	13.706	6.988
8	1001	201608	11.543	9.294
9	1001	201609	12.457	7.308
10	1001	201610	14.250	8.934
11	1001	201611	11.148	7.726
12	1001	201612	19.648	6.331

Conditions;

ID1 = ID6 and
Allocate1 > 0 and
(Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0

If these conditions are met then 1, else 0 in FINAL column.

The table I want to create is below;

	ID	COHORT	ALLOCATE	USAGE	FINAL
1	1001	201601	15.049	6.766	1
2	1001	201602	13.507	5.088	1
3	1001	201603	11.872	9.352	1
4	1001	201604	13.791	7.404	1
5	1001	201605	0	8.866	0
6	1001	201606	16.906	8.897	1
7	1001	201607	13.706	0	0
8	1001	201608	11.543	0	0
9	1001	201609	12.457	0	0
10	1001	201610	14.250	0	0
11	1001	201611	11.148	0	0
12	1001	201612	19.648	0	0

How can I do that in datastep procedure?

Thanks a lot in advance.

Shmuel · Posted 08-12-2018 01:42 PM

Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)

then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).

Try it and post your code in case of any issue.

View solution in original post

hashman · Posted 08-12-2018 01:01 PM

@gibsonsg: I have no idea how, given the conditions you describe and your sample input, you arrive at zeros anywhere in the sample output. Before you logic can be translated into a DATA step (or any other program), it must be clearly understood. I admit that I totally fail to understand it.

Paul D.

gibsonsg · Posted 08-12-2018 01:20 PM

Hi hasman,

You are right, I have to correct the conditions. I want to check the ID with the following sixth ID.

Thanks for warning.

For first row;

ID1 = ID6 and

Allocate1 > 0 and

(Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0

For second row;

ID2 = ID7 and

Allocate2 > 0 and

(Usage2 + Usage3 + Usage4 + Usage5 + Usage6 + Usage7) > 0

For thirth row;

ID3 = ID8 and

Allocate3 > 0 and

(Usage3 + Usage4 + Usage5 + Usage6 + Usage7 + Usage8) > 0

and so on.

hashman · Posted 08-12-2018 01:51 PM

@gibsonsg:

Thank you, it's much clearer. Yet it still doesn't explain your proposed output because all of these conditions are met for every row:

Your ID values are identical in all rows, so for any two rows i and j ID[i]=ID[j] is always true.
The Allocate variable is positive in every row, hence for every starting row i, Allocate[i]>0 is always true.
The values of Usage are positive for all rows, thus for every range between row i and j>i inclusive, the sum of Usage from Usage[i] to Usage[j] is > 0.

For starters, could you explain what combination of your conditions causes FINAL to become 0 in output row 5?

Paul D.

Shmuel · Posted 08-12-2018 01:42 PM

Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)

then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).

Try it and post your code in case of any issue.

gibsonsg · Posted 08-23-2018 04:07 PM

Hi again.

After some research, I found that intnx function is the another handy method to solve the problem.

I hope it would be useful.

Input Table

	ID	COHORT	ALLOCATE	USAGE
1	1001	201601	15.049	6.766
2	1001	201602	13.507	5.088
3	1001	201603	11.872	9.352
4	1001	201604	13.791	7.404
5	1001	201605	0	8.866
6	1001	201606	16.906	8.897
7	1001	201607	13.706	0
8	1001	201608	11.543	0
9	1001	201609	12.457	0
10	1001	201610	14.250	0
11	1001	201611	11.148	0
12	1001	201612	19.648	0

Codes

For the solution, DATE column was created for each observation in Cohort column.

(201601=31.01.2016, 201602=29.02.2016 etc.)

proc sql;

create table work.vertical_sum as

select

ID,

COHORT,

DATE,

ALLOCATE,

SUM(USAGE) as TOTAL_USAGE

from

(select t1.ID, t1.COHORT, t1.DATE, t1.ALLOCATE, t2.USAGE from work.vertical as t1

left join work.vertical as t2

on (t1.ID=t2.ID

and

t2.DATE between intnx('month', t1.DATE, 0) and intnx('month', t1.DATE, 6)))

group by ID, COHORT

order by ID, COHORT;

create table work.final_sum as

select

t1.ID,

t1.COHORT,

t1.ALLOCATE,

t2.TOTAL_USAGE,

Case When t1.ALLOCATE > 0 and t2.TOTAL_USAGE > 0 Then 1 Else 0 End as FINAL

from work.vertical_sum as t1

inner join work.vertical_sum as t2

on t1.ID=t2.ID and t1.COHORT=t2.COHORT

order by t1.ID, t1.COHORT;

quit;

proc sort data=work.final_sum out=work.final_sum nodupkey;

by cohort;

run;

Result Table

	ID	COHORT	DATE	ALLOCATE	TOTAL_USAGE	FINAL
1	1001	201601	31.01.2016	15.049	46.373	1
2	1001	201602	29.02.2016	13.507	39.607	1
3	1001	201603	31.03.2016	11.872	34.519	1
4	1001	201604	30.04.2016	13.791	25.167	1
5	1001	201605	31.05.2016	0	17.763	0
6	1001	201606	30.06.2016	16.906	8.897	1
7	1001	201607	31.07.2016	13.706	0	0
8	1001	201608	31.08.2016	11.543	0	0
9	1001	201609	30.09.2016	12.457	0	0
10	1001	201610	31.10.2016	14.25	0	0
11	1001	201611	30.11.2016	11.148	0	0
12	1001	201612	31.12.2016	19.648	0	0

SAS Innovate 2025: Call for Content

Classroom Training Available!