BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gibsonsg
Calcite | Level 5

Hi everyone.

 

I have a problem about vertical summation by some conditions.

 

Main dataset is below;

 

 

ID

COHORT

ALLOCATE

USAGE

1

1001

201601

15.049

6.766

2

1001

201602

13.507

5.088

3

1001

201603

11.872

9.352

4

1001

201604

13.791

7.404

5

1001

201605

13.618

8.866

6

1001

201606

16.906

8.897

7

1001

201607

13.706

6.988

8

1001

201608

11.543

9.294

9

1001

201609

12.457

7.308

10

1001

201610

14.250

8.934

11

1001

201611

11.148

7.726

12

1001

201612

19.648

6.331

 

Conditions;

  • ID1 = ID6 and
  • Allocate1 > 0 and
  • (Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0

If these conditions are met then 1, else 0 in FINAL column.

 

The table I want to create is below;

 

 

ID

COHORT

ALLOCATE

USAGE

FINAL

1

1001

201601

15.049

6.766

1

2

1001

201602

13.507

5.088

1

3

1001

201603

11.872

9.352

1

4

1001

201604

13.791

7.404

1

5

1001

201605

0

8.866

0

6

1001

201606

16.906

8.897

1

7

1001

201607

13.706

0

0

8

1001

201608

11.543

0

0

9

1001

201609

12.457

0

0

10

1001

201610

14.250

0

0

11

1001

201611

11.148

0

0

12

1001

201612

19.648

0

0

 

How can I do that in datastep procedure?

 

Thanks a lot in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)

then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).

Try it and post your code in case of any issue.

View solution in original post

5 REPLIES 5
hashman
Ammonite | Level 13

@gibsonsg: I have no idea how, given the conditions you describe and your sample input, you arrive at zeros anywhere in the sample output. Before you logic can be translated into a DATA step (or any other program), it must be clearly understood. I admit that I totally fail to understand it.

 

Paul D.

gibsonsg
Calcite | Level 5

Hi hasman,

You are right, I have to correct the conditions. I want to check the ID with the following sixth ID.

Thanks for warning.

 

For first row;

ID1 = ID6 and

Allocate1 > 0 and

(Usage1 + Usage2 + Usage3 + Usage4 + Usage5 + Usage6) > 0

 

For second row;

ID2 = ID7 and

Allocate2 > 0 and

(Usage2 + Usage3 + Usage4 + Usage5 + Usage6 + Usage7) > 0

 

For thirth row;

ID3 = ID8 and

Allocate3 > 0 and

(Usage3 + Usage4 + Usage5 + Usage6 + Usage7 + Usage8) > 0

 

and so on.

hashman
Ammonite | Level 13

@gibsonsg

 

Thank you, it's much clearer. Yet it still doesn't explain your proposed output because all of these conditions are met for every row:

 

  • Your ID values are identical in all rows, so for any two rows i and j ID[i]=ID[j] is always true.
  • The Allocate variable is positive in every row, hence for every starting row i, Allocate[i]>0 is always true.
  • The values of Usage are positive for all rows, thus for every range between row i and j>i inclusive, the sum of Usage from Usage[i] to Usage[j] is > 0. 

 

For starters, could you explain what combination of your conditions causes FINAL to become 0 in output row 5?

 

Paul D.

Shmuel
Garnet | Level 18

Instead of looking forward to a late row, you can sort the data by descending cohort (seems to be a date of yyyymm format)

then use lag function to peak at a previous rows (lag1 or lag2 or lag3 ...).

Try it and post your code in case of any issue.

gibsonsg
Calcite | Level 5

Hi again.

After some research, I found that intnx function is the another handy method to solve the problem.

I hope it would be useful.

 

Input Table

 

ID

COHORT

ALLOCATE

USAGE

1

1001

201601

15.049

6.766

2

1001

201602

13.507

5.088

3

1001

201603

11.872

9.352

4

1001

201604

13.791

7.404

5

1001

201605

0

8.866

6

1001

201606

16.906

8.897

7

1001

201607

13.706

0

8

1001

201608

11.543

0

9

1001

201609

12.457

0

10

1001

201610

14.250

0

11

1001

201611

11.148

0

12

1001

201612

19.648

0

 

Codes

For the solution, DATE column was created for each observation in Cohort column.

(201601=31.01.2016, 201602=29.02.2016 etc.)

 

proc sql;

create table work.vertical_sum as

select

ID,

COHORT,

DATE,

ALLOCATE,

SUM(USAGE) as TOTAL_USAGE

from

(select t1.ID, t1.COHORT, t1.DATE, t1.ALLOCATE, t2.USAGE from work.vertical as t1

left join work.vertical as t2

on (t1.ID=t2.ID

and

t2.DATE between intnx('month', t1.DATE, 0)  and  intnx('month', t1.DATE, 6)))

group by ID, COHORT

order by ID, COHORT;

 

create table work.final_sum as

select

t1.ID,

t1.COHORT,

t1.ALLOCATE,

t2.TOTAL_USAGE,

Case When t1.ALLOCATE > 0 and t2.TOTAL_USAGE > 0 Then 1 Else 0 End as FINAL

from work.vertical_sum as t1

inner join work.vertical_sum as t2

on t1.ID=t2.ID and t1.COHORT=t2.COHORT

order by t1.ID, t1.COHORT;

quit;

 

proc sort data=work.final_sum out=work.final_sum nodupkey;

by cohort; 

run;

 

Result Table

 

ID

COHORT

DATE

ALLOCATE

TOTAL_USAGE

FINAL

1

1001

201601

31.01.2016

15.049

46.373

1

2

1001

201602

29.02.2016

13.507

39.607

1

3

1001

201603

31.03.2016

11.872

34.519

1

4

1001

201604

30.04.2016

13.791

25.167

1

5

1001

201605

31.05.2016

0

17.763

0

6

1001

201606

30.06.2016

16.906

8.897

1

7

1001

201607

31.07.2016

13.706

0

0

8

1001

201608

31.08.2016

11.543

0

0

9

1001

201609

30.09.2016

12.457

0

0

10

1001

201610

31.10.2016

14.25

0

0

11

1001

201611

30.11.2016

11.148

0

0

12

1001

201612

31.12.2016

19.648

0

0

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 837 views
  • 3 likes
  • 3 in conversation