Solved
New Contributor
Posts: 4

# Conditional lag

Hi there,

I would need your help in restructuring a database based on a time series.

I have a database similar to this one (row-values):

 Row UserID Category Day Values PosVal(d-1) PosVal(d-2) PosVal(d-3) 1 970 1 1 3 0 0 0 2 444 1 1 4 0 0 0 3 333 1 1 5 0 0 0 4 234 1 1 1 0 0 0 5 970 1 2 2 0 0 0 6 555 1 2 3 0 0 0 7 444 1 2 42 4 0 0 8 456 1 2 7 0 0 0 9 970 1 3 5 2 0 0 10 333 1 3 31 0 5 0 11 123 1 3 4 0 0 0 12 321 1 3 63 0 0 0 13 6557 1 3 2 0 0 0 14 970 1 4 4 5 2 0 15 234 1 4 5 0 0 1 16 213 1 4 3 0 0 0 17 456 1 4 2 0 7 0 18 233 1 4 1 0 0 0 19 444 2 1 4 0 0 0 20 333 2 1 63 0 0 0 21 444 2 2 2 4 0 0 22 654 2 2 4 0 0 0 23 444 2 3 7 2 4 0

And PosVal(d-1), PosVal(d-2) and PosVal(d-3) are the variables that I need to create.

Basically these are conditional lags of the “values” depending on the userid, category and day where d-1 stands for the first lag, d-2 2 lags etc.  I believe that would require loops but I am very much unable to create these lags… I’d very much appreciate your help!

Thanks you

Jo

Accepted Solutions
Solution
‎02-14-2016 03:03 PM
Posts: 3,852

## Re: Conditional lag

data lags;
infile cards expandtabs firstobs=2;
input row userid:\$4. cat day value l1-l3;
drop row;
cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
run;
proc sort;
by userid cat day;
run;
proc print;
run;
data lags2;
do until(last.cat);
set lags;
by userid cat;
array t[4] _temporary_;
t[day]=value;
array x[3];
if day gt 1 then x[1]=t[day-1];
if day gt 2 then x[2]=t[day-2];
if day gt 3 then x[3]=t[day-3];
output;
end;
call missing(of t[*]);
run;
proc print;
run;

This would be easier if you expanded the data to fill in the missing days.

All Replies
Super User
Posts: 5,876

## Re: Conditional lag

It's unclear to me how the lagged values are calculated.
Also, try to get started with a program. Even if it doesn't work it might help us understand what you are trying to do.
Data never sleeps
New Contributor
Posts: 4

## Re: Conditional lag

I'll  try to explain it by userid=444.

Userid=444 has the following values (from the above table):

 Row UserID Category Day Values PosVal(d-1) PosVal(d-2) 1 444 1 1 4 0 0 2 444 1 2 42 4 0 3 444 2 1 4 0 0 4 444 2 2 2 4 0 5 444 2 3 7 2 4

Values is a column and the content of it should be lagged.

PosVal(d-1)

• is the content of “ Values”  from the day before (d-1) of userid  X in category X
• If there is no row of UserIDX on the previous day then PosVal(d-1)=0
• Example row 1: PosVal(d-1) = content of “ Values”   of userid444 on day=0 in category 1 -> day 0 does not exist, hence=0
• Example row2= PostVal(d-1) content of “ Values”    of userid444 on day=1 (day before day 2) in category 1 is 4;

PosVal(d-2)

• Is the content of “ Values”   from two days before (d-2) of userid 444 in category
• If there is no row of UserIDx two days before then PosVal(d-2)=0
• Example row 1: PosVal(d-2) = content of “ Values”    of userid444 on day= -1 in category 1 -> day -1 does not exist, hence=0
• Example row 5: PosVal(d-2) = content of “ Values”    of userid444 on day= 1 (two days before day3) in category1 is 4

I am pretty much stuck in this misery...and appreciate any further help....

Super User
Posts: 6,762

## Re: Conditional lag

You will need a three-step approach.  Here's an overview.

The first step is already done in your second example:  sort the data set by UserID Category.

Step 2:  Compute the lags every time.

PosVal_d1 = lag(value);

PosVal_d2 = lag2(value);

Step 3:  Re-set the assigned values of PosVal_d1 and PosVal_d2 to 0 when appropriate.  This means adding a BY statement (BY UserID Category) to create some of the tools that IF/THEN statements will require to reset the calculated variables.  If this becomes complex, we can always re-visit the final parts of the logic.

Super User
Posts: 23,700

## Re: Conditional lag

Have you looked into proc expand? Look at the last example and your BY groups would be userid and category.

New Contributor
Posts: 4

## Re: Conditional lag

oh my! it was as simple as that. Thanks for your help

Solution
‎02-14-2016 03:03 PM
Posts: 3,852

## Re: Conditional lag

data lags;
infile cards expandtabs firstobs=2;
input row userid:\$4. cat day value l1-l3;
drop row;
cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
run;
proc sort;
by userid cat day;
run;
proc print;
run;
data lags2;
do until(last.cat);
set lags;
by userid cat;
array t[4] _temporary_;
t[day]=value;
array x[3];
if day gt 1 then x[1]=t[day-1];
if day gt 2 then x[2]=t[day-2];
if day gt 3 then x[3]=t[day-3];
output;
end;
call missing(of t[*]);
run;
proc print;
run;

This would be easier if you expanded the data to fill in the missing days.

New Contributor
Posts: 4

## Re: Conditional lag

Thanks a lot!!!!!
Super User
Posts: 23,700