Hi there,
I would need your help in restructuring a database based on a time series.
I have a database similar to this one (row-values):
Row | UserID | Category | Day | Values | PosVal(d-1) | PosVal(d-2) | PosVal(d-3) |
1 | 970 | 1 | 1 | 3 | 0 | 0 | 0 |
2 | 444 | 1 | 1 | 4 | 0 | 0 | 0 |
3 | 333 | 1 | 1 | 5 | 0 | 0 | 0 |
4 | 234 | 1 | 1 | 1 | 0 | 0 | 0 |
5 | 970 | 1 | 2 | 2 | 0 | 0 | 0 |
6 | 555 | 1 | 2 | 3 | 0 | 0 | 0 |
7 | 444 | 1 | 2 | 42 | 4 | 0 | 0 |
8 | 456 | 1 | 2 | 7 | 0 | 0 | 0 |
9 | 970 | 1 | 3 | 5 | 2 | 0 | 0 |
10 | 333 | 1 | 3 | 31 | 0 | 5 | 0 |
11 | 123 | 1 | 3 | 4 | 0 | 0 | 0 |
12 | 321 | 1 | 3 | 63 | 0 | 0 | 0 |
13 | 6557 | 1 | 3 | 2 | 0 | 0 | 0 |
14 | 970 | 1 | 4 | 4 | 5 | 2 | 0 |
15 | 234 | 1 | 4 | 5 | 0 | 0 | 1 |
16 | 213 | 1 | 4 | 3 | 0 | 0 | 0 |
17 | 456 | 1 | 4 | 2 | 0 | 7 | 0 |
18 | 233 | 1 | 4 | 1 | 0 | 0 | 0 |
19 | 444 | 2 | 1 | 4 | 0 | 0 | 0 |
20 | 333 | 2 | 1 | 63 | 0 | 0 | 0 |
21 | 444 | 2 | 2 | 2 | 4 | 0 | 0 |
22 | 654 | 2 | 2 | 4 | 0 | 0 | 0 |
23 | 444 | 2 | 3 | 7 | 2 | 4 | 0 |
And PosVal(d-1), PosVal(d-2) and PosVal(d-3) are the variables that I need to create.
Basically these are conditional lags of the “values” depending on the userid, category and day where d-1 stands for the first lag, d-2 2 lags etc. I believe that would require loops but I am very much unable to create these lags… I’d very much appreciate your help!
Thanks you
Jo
data lags;
infile cards expandtabs firstobs=2;
input row userid:$4. cat day value l1-l3;
drop row;
cards;
Row UserID Category Day Values PosVal(d-1) PosVal(d-2) PosVal(d-3)
1 970 1 1 3 0 0 0
2 444 1 1 4 0 0 0
3 333 1 1 5 0 0 0
4 234 1 1 1 0 0 0
5 970 1 2 2 0 0 0
6 555 1 2 3 0 0 0
7 444 1 2 42 4 0 0
8 456 1 2 7 0 0 0
9 970 1 3 5 2 0 0
10 333 1 3 31 0 5 0
11 123 1 3 4 0 0 0
12 321 1 3 63 0 0 0
13 6557 1 3 2 0 0 0
14 970 1 4 4 5 2 0
15 234 1 4 5 0 0 1
16 213 1 4 3 0 0 0
17 456 1 4 2 0 7 0
18 233 1 4 1 0 0 0
19 444 2 1 4 0 0 0
20 333 2 1 63 0 0 0
21 444 2 2 2 4 0 0
22 654 2 2 4 0 0 0
23 444 2 3 7 2 4 0
;;;;
run;
proc sort;
by userid cat day;
run;
proc print;
run;
data lags2;
do until(last.cat);
set lags;
by userid cat;
array t[4] _temporary_;
t[day]=value;
array x[3];
if day gt 1 then x[1]=t[day-1];
if day gt 2 then x[2]=t[day-2];
if day gt 3 then x[3]=t[day-3];
output;
end;
call missing(of t[*]);
run;
proc print;
run;
This would be easier if you expanded the data to fill in the missing days.
I'll try to explain it by userid=444.
Userid=444 has the following values (from the above table):
Row | UserID | Category | Day | Values | PosVal(d-1) | PosVal(d-2) |
1 | 444 | 1 | 1 | 4 | 0 | 0 |
2 | 444 | 1 | 2 | 42 | 4 | 0 |
3 | 444 | 2 | 1 | 4 | 0 | 0 |
4 | 444 | 2 | 2 | 2 | 4 | 0 |
5 | 444 | 2 | 3 | 7 | 2 | 4 |
Values is a column and the content of it should be lagged.
PosVal(d-1)
PosVal(d-2)
I am pretty much stuck in this misery...and appreciate any further help....
You will need a three-step approach. Here's an overview.
The first step is already done in your second example: sort the data set by UserID Category.
Step 2: Compute the lags every time.
PosVal_d1 = lag(value);
PosVal_d2 = lag2(value);
Step 3: Re-set the assigned values of PosVal_d1 and PosVal_d2 to 0 when appropriate. This means adding a BY statement (BY UserID Category) to create some of the tools that IF/THEN statements will require to reset the calculated variables. If this becomes complex, we can always re-visit the final parts of the logic.
Have you looked into proc expand? Look at the last example and your BY groups would be userid and category.
oh my! it was as simple as that. Thanks for your help
data lags;
infile cards expandtabs firstobs=2;
input row userid:$4. cat day value l1-l3;
drop row;
cards;
Row UserID Category Day Values PosVal(d-1) PosVal(d-2) PosVal(d-3)
1 970 1 1 3 0 0 0
2 444 1 1 4 0 0 0
3 333 1 1 5 0 0 0
4 234 1 1 1 0 0 0
5 970 1 2 2 0 0 0
6 555 1 2 3 0 0 0
7 444 1 2 42 4 0 0
8 456 1 2 7 0 0 0
9 970 1 3 5 2 0 0
10 333 1 3 31 0 5 0
11 123 1 3 4 0 0 0
12 321 1 3 63 0 0 0
13 6557 1 3 2 0 0 0
14 970 1 4 4 5 2 0
15 234 1 4 5 0 0 1
16 213 1 4 3 0 0 0
17 456 1 4 2 0 7 0
18 233 1 4 1 0 0 0
19 444 2 1 4 0 0 0
20 333 2 1 63 0 0 0
21 444 2 2 2 4 0 0
22 654 2 2 4 0 0 0
23 444 2 3 7 2 4 0
;;;;
run;
proc sort;
by userid cat day;
run;
proc print;
run;
data lags2;
do until(last.cat);
set lags;
by userid cat;
array t[4] _temporary_;
t[day]=value;
array x[3];
if day gt 1 then x[1]=t[day-1];
if day gt 2 then x[2]=t[day-2];
if day gt 3 then x[3]=t[day-3];
output;
end;
call missing(of t[*]);
run;
proc print;
run;
This would be easier if you expanded the data to fill in the missing days.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.