DATA Step, Macro, Functions and more

Conditional lag

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Conditional lag

Hi there,

I would need your help in restructuring a database based on a time series.

 

I have a database similar to this one (row-values):

Row

UserID

Category

Day

Values

PosVal(d-1)

PosVal(d-2)

PosVal(d-3)

1

970

1

1

3

0

0

0

2

444

1

1

4

0

0

0

3

333

1

1

5

0

0

0

4

234

1

1

1

0

0

0

5

970

1

2

2

0

0

0

6

555

1

2

3

0

0

0

7

444

1

2

42

4

0

0

8

456

1

2

7

0

0

0

9

970

1

3

5

2

0

0

10

333

1

3

31

0

5

0

11

123

1

3

4

0

0

0

12

321

1

3

63

0

0

0

13

6557

1

3

2

0

0

0

14

970

1

4

4

5

2

0

15

234

1

4

5

0

0

1

16

213

1

4

3

0

0

0

17

456

1

4

2

0

7

0

18

233

1

4

1

0

0

0

19

444

2

1

4

0

0

0

20

333

2

1

63

0

0

0

21

444

2

2

2

4

0

0

22

654

2

2

4

0

0

0

23

444

2

3

7

2

4

0

 

And PosVal(d-1), PosVal(d-2) and PosVal(d-3) are the variables that I need to create.

 

 

Basically these are conditional lags of the “values” depending on the userid, category and day where d-1 stands for the first lag, d-2 2 lags etc.  I believe that would require loops but I am very much unable to create these lags… I’d very much appreciate your help!

 

Thanks you

 

Jo


Accepted Solutions
Solution
‎02-14-2016 03:03 PM
Respected Advisor
Posts: 3,777

Re: Conditional lag

data lags;
   infile cards expandtabs firstobs=2;
   input row userid:$4. cat day value l1-l3;
   drop row;
   cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
   run;
proc sort;
   by userid cat day;
   run;
proc print;
   run;
data lags2;
   do until(last.cat);
      set lags;
      by userid cat;
      array t[4] _temporary_;
      t[day]=value;
      array x[3];
      if day gt 1 then x[1]=t[day-1];
      if day gt 2 then x[2]=t[day-2];
      if day gt 3 then x[3]=t[day-3];
      output;
      end;
   call missing(of t[*]);
   run;
proc print;
   run;

This would be easier if you expanded the data to fill in the missing days.

 

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 5,255

Re: Conditional lag

It's unclear to me how the lagged values are calculated.
Also, try to get started with a program. Even if it doesn't work it might help us understand what you are trying to do.
Data never sleeps
New Contributor
Posts: 4

Re: Conditional lag

I'll  try to explain it by userid=444. 

 

Userid=444 has the following values (from the above table):

Row

UserID

Category

Day

Values

PosVal(d-1)

PosVal(d-2)

1

444

1

1

4

0

0

2

444

1

2

42

4

0

3

444

2

1

4

0

0

4

444

2

2

2

4

0

5

444

2

3

7

2

4

 

Values is a column and the content of it should be lagged.

 

PosVal(d-1)

  • is the content of “ Values”  from the day before (d-1) of userid  X in category X
  • If there is no row of UserIDX on the previous day then PosVal(d-1)=0
    • Example row 1: PosVal(d-1) = content of “ Values”   of userid444 on day=0 in category 1 -> day 0 does not exist, hence=0
    • Example row2= PostVal(d-1) content of “ Values”    of userid444 on day=1 (day before day 2) in category 1 is 4;

 

PosVal(d-2)

  • Is the content of “ Values”   from two days before (d-2) of userid 444 in category
  • If there is no row of UserIDx two days before then PosVal(d-2)=0
    • Example row 1: PosVal(d-2) = content of “ Values”    of userid444 on day= -1 in category 1 -> day -1 does not exist, hence=0
    • Example row 5: PosVal(d-2) = content of “ Values”    of userid444 on day= 1 (two days before day3) in category1 is 4

 

I am pretty much stuck in this misery...and appreciate any further help....

Super User
Posts: 5,079

Re: Conditional lag

You will need a three-step approach.  Here's an overview.

 

The first step is already done in your second example:  sort the data set by UserID Category.

 

Step 2:  Compute the lags every time.

 

PosVal_d1 = lag(value);

PosVal_d2 = lag2(value);

 

Step 3:  Re-set the assigned values of PosVal_d1 and PosVal_d2 to 0 when appropriate.  This means adding a BY statement (BY UserID Category) to create some of the tools that IF/THEN statements will require to reset the calculated variables.  If this becomes complex, we can always re-visit the final parts of the logic.

Super User
Posts: 17,784

Re: Conditional lag

Have you looked into proc expand? Look at the last example and your BY groups would be userid and category. 

New Contributor
Posts: 4

Re: Conditional lag

oh my! it was as simple as that. Thanks for your help

Solution
‎02-14-2016 03:03 PM
Respected Advisor
Posts: 3,777

Re: Conditional lag

data lags;
   infile cards expandtabs firstobs=2;
   input row userid:$4. cat day value l1-l3;
   drop row;
   cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
   run;
proc sort;
   by userid cat day;
   run;
proc print;
   run;
data lags2;
   do until(last.cat);
      set lags;
      by userid cat;
      array t[4] _temporary_;
      t[day]=value;
      array x[3];
      if day gt 1 then x[1]=t[day-1];
      if day gt 2 then x[2]=t[day-2];
      if day gt 3 then x[3]=t[day-3];
      output;
      end;
   call missing(of t[*]);
   run;
proc print;
   run;

This would be easier if you expanded the data to fill in the missing days.

 

Capture.PNG

New Contributor
Posts: 4

Re: Conditional lag

Thanks a lot!!!!!
Super User
Posts: 17,784

Re: Conditional lag

Please mark the question as answered if you have a solution.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 449 views
  • 0 likes
  • 5 in conversation