BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JoMa
Fluorite | Level 6

Hi there,

I would need your help in restructuring a database based on a time series.

 

I have a database similar to this one (row-values):

Row

UserID

Category

Day

Values

PosVal(d-1)

PosVal(d-2)

PosVal(d-3)

1

970

1

1

3

0

0

0

2

444

1

1

4

0

0

0

3

333

1

1

5

0

0

0

4

234

1

1

1

0

0

0

5

970

1

2

2

0

0

0

6

555

1

2

3

0

0

0

7

444

1

2

42

4

0

0

8

456

1

2

7

0

0

0

9

970

1

3

5

2

0

0

10

333

1

3

31

0

5

0

11

123

1

3

4

0

0

0

12

321

1

3

63

0

0

0

13

6557

1

3

2

0

0

0

14

970

1

4

4

5

2

0

15

234

1

4

5

0

0

1

16

213

1

4

3

0

0

0

17

456

1

4

2

0

7

0

18

233

1

4

1

0

0

0

19

444

2

1

4

0

0

0

20

333

2

1

63

0

0

0

21

444

2

2

2

4

0

0

22

654

2

2

4

0

0

0

23

444

2

3

7

2

4

0

 

And PosVal(d-1), PosVal(d-2) and PosVal(d-3) are the variables that I need to create.

 

 

Basically these are conditional lags of the “values” depending on the userid, category and day where d-1 stands for the first lag, d-2 2 lags etc.  I believe that would require loops but I am very much unable to create these lags… I’d very much appreciate your help!

 

Thanks you

 

Jo

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19
data lags;
   infile cards expandtabs firstobs=2;
   input row userid:$4. cat day value l1-l3;
   drop row;
   cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
   run;
proc sort;
   by userid cat day;
   run;
proc print;
   run;
data lags2;
   do until(last.cat);
      set lags;
      by userid cat;
      array t[4] _temporary_;
      t[day]=value;
      array x[3];
      if day gt 1 then x[1]=t[day-1];
      if day gt 2 then x[2]=t[day-2];
      if day gt 3 then x[3]=t[day-3];
      output;
      end;
   call missing(of t[*]);
   run;
proc print;
   run;

This would be easier if you expanded the data to fill in the missing days.

 

Capture.PNG

View solution in original post

8 REPLIES 8
LinusH
Tourmaline | Level 20
It's unclear to me how the lagged values are calculated.
Also, try to get started with a program. Even if it doesn't work it might help us understand what you are trying to do.
Data never sleeps
JoMa
Fluorite | Level 6

I'll  try to explain it by userid=444. 

 

Userid=444 has the following values (from the above table):

Row

UserID

Category

Day

Values

PosVal(d-1)

PosVal(d-2)

1

444

1

1

4

0

0

2

444

1

2

42

4

0

3

444

2

1

4

0

0

4

444

2

2

2

4

0

5

444

2

3

7

2

4

 

Values is a column and the content of it should be lagged.

 

PosVal(d-1)

  • is the content of “ Values”  from the day before (d-1) of userid  X in category X
  • If there is no row of UserIDX on the previous day then PosVal(d-1)=0
    • Example row 1: PosVal(d-1) = content of “ Values”   of userid444 on day=0 in category 1 -> day 0 does not exist, hence=0
    • Example row2= PostVal(d-1) content of “ Values”    of userid444 on day=1 (day before day 2) in category 1 is 4;

 

PosVal(d-2)

  • Is the content of “ Values”   from two days before (d-2) of userid 444 in category
  • If there is no row of UserIDx two days before then PosVal(d-2)=0
    • Example row 1: PosVal(d-2) = content of “ Values”    of userid444 on day= -1 in category 1 -> day -1 does not exist, hence=0
    • Example row 5: PosVal(d-2) = content of “ Values”    of userid444 on day= 1 (two days before day3) in category1 is 4

 

I am pretty much stuck in this misery...and appreciate any further help....

Astounding
PROC Star

You will need a three-step approach.  Here's an overview.

 

The first step is already done in your second example:  sort the data set by UserID Category.

 

Step 2:  Compute the lags every time.

 

PosVal_d1 = lag(value);

PosVal_d2 = lag2(value);

 

Step 3:  Re-set the assigned values of PosVal_d1 and PosVal_d2 to 0 when appropriate.  This means adding a BY statement (BY UserID Category) to create some of the tools that IF/THEN statements will require to reset the calculated variables.  If this becomes complex, we can always re-visit the final parts of the logic.

Reeza
Super User

Have you looked into proc expand? Look at the last example and your BY groups would be userid and category. 

JoMa
Fluorite | Level 6

oh my! it was as simple as that. Thanks for your help

data_null__
Jade | Level 19
data lags;
   infile cards expandtabs firstobs=2;
   input row userid:$4. cat day value l1-l3;
   drop row;
   cards;
Row	UserID	Category	Day	Values	PosVal(d-1)	PosVal(d-2)	PosVal(d-3)
1	970	1	1	3	0	0	0
2	444	1	1	4	0	0	0
3	333	1	1	5	0	0	0
4	234	1	1	1	0	0	0
5	970	1	2	2	0	0	0
6	555	1	2	3	0	0	0
7	444	1	2	42	4	0	0
8	456	1	2	7	0	0	0
9	970	1	3	5	2	0	0
10	333	1	3	31	0	5	0
11	123	1	3	4	0	0	0
12	321	1	3	63	0	0	0
13	6557	1	3	2	0	0	0
14	970	1	4	4	5	2	0
15	234	1	4	5	0	0	1
16	213	1	4	3	0	0	0
17	456	1	4	2	0	7	0
18	233	1	4	1	0	0	0
19	444	2	1	4	0	0	0
20	333	2	1	63	0	0	0
21	444	2	2	2	4	0	0
22	654	2	2	4	0	0	0
23	444	2	3	7	2	4	0
;;;;
   run;
proc sort;
   by userid cat day;
   run;
proc print;
   run;
data lags2;
   do until(last.cat);
      set lags;
      by userid cat;
      array t[4] _temporary_;
      t[day]=value;
      array x[3];
      if day gt 1 then x[1]=t[day-1];
      if day gt 2 then x[2]=t[day-2];
      if day gt 3 then x[3]=t[day-3];
      output;
      end;
   call missing(of t[*]);
   run;
proc print;
   run;

This would be easier if you expanded the data to fill in the missing days.

 

Capture.PNG

JoMa
Fluorite | Level 6
Thanks a lot!!!!!
Reeza
Super User
Please mark the question as answered if you have a solution.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2487 views
  • 0 likes
  • 5 in conversation