Hello,
I am trying to make a column that will be added to an existing data set. I need it to Multiply its number by the 6 previous entries (including its own for a total of 7) but only if the previous entries are under the same categories:
I have looked up Rolling lag "by group" but i have had no luck creating my result column.
Category A | Category B | Numbers | **Column I am looking to Add** |
---|---|---|---|
A | B | 2 | 2 |
A | B | 3 | 6 |
A | B | 3 | 18 |
A | B | 4 | 72 |
A | B | 2 | 144 |
A | B | 2 | 288 |
A | B | 2 | 576 |
A | B | 1 | 288 |
B | B | 2 | 2 |
B | B | 3 | 6 |
B | A | 2 | 2 |
B | A | 4 | 8 |
All help is appreciated. Thank you.
Sorry, I missed that part about the last 7 entries only. Here's a modification:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
back7 = lag7(factor);
if first.category_b then do;
new_column=factor;
counter = 1;
end;
else do;
new_column = new_column * factor;
counter + 1;
if counter > 7 then new_column = new_column / back7;
end;
retain new_column;
drop back7;
run;
Could use retain:
data have (drop=lsta lstb lstnum);
attrib CatA catB format=$1. Numbers format=best.;
infile cards;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
B B 2
B B 3
B A 2
;
run;
data want;
set have;
attrib result format=best.;
retain lsta lstb lstnum;
if _n_=1 or lsta ne cata or lstb ne catb then do;
lsta=cata;
lstb=catb;
lstnum=numbers;
result=numbers;
end;
else do;
result=lstnum * numbers;
lstnum=result;
end;
run;
Thank you for your swift response. A few things,
1) It ended up creating new columns for result, CatA, CatB, 1st number and result, is there a way to avoid this and just have the result be the remaining column?
2) I forgot to mention this, but if there is nothing in the cells, then to treat the value in them as "1" ..... IE
A B 3 3
A B - 3
A B 2 6
Thanks!
You can simplify some of the logic using BY group processing:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
if first.category_b then new_column=factor;
else new_column = new_column * factor;
retain new_column;
run;
Good luck.
Can this be applied to also the last 7 Numbers? Yours worked in the way I wanted to except the new column kept compounding as opposed to keep a "rolling" product of only the last 7 entries.
Sorry, I missed that part about the last 7 entries only. Here's a modification:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
back7 = lag7(factor);
if first.category_b then do;
new_column=factor;
counter = 1;
end;
else do;
new_column = new_column * factor;
counter + 1;
if counter > 7 then new_column = new_column / back7;
end;
retain new_column;
drop back7;
run;
Thank you very much. Works Perfectly. Appreciate everyone else's efforts as well. Great community here.
Yep, just modify as:
data want (drop=lsta lstb lstnum);
set have;
attrib result format=best.;
retain lsta lstb lstnum;
if _n_=1 or lsta ne cata or lstb ne catb then do;
lsta=cata;
lstb=catb;
lstnum=numbers;
result=numbers;
end;
else do;
if numbers=. then result=lstnum;
else result=lstnum * numbers;
lstnum=result;
end;
run;
data have;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B .
A B 4
B B 2
B B 3
B A 2
;
run;
data want;
set have;
by cata;
retain temp;
if first.cata then add=numbers;
else add=coalesce(numbers,1)*temp;
temp=add;
drop temp;
run;
The following code works even if you have more then 7 observations in one by group:
data have;
attrib CatA catB format=$1. Numbers format=best.;
infile cards;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
A B 2
A B 2
A B 2
A B 1
B B 2
B B 3
B A 2
B A 4
;
run;
data want;
set have;
keep catA catB result Numbers; by catA catB notsorted;
array lastNums[7] _temporary_;
retain pointer;
if first.catB then do;
do i=1 to dim(lastNums);
lastNums=1;
end;
pointer=1;
end;
lastNums[pointer]=coalesce(numbers,1);
pointer=mod(pointer,dim(lastNums))+1;
result=1;
do i=1 to dim(lastNums);
result=result*lastNums;
end;
run;
Message was edited by: Gergely Bathó - modified the keep statement.
Try this !!
data y1;
input cat_1 $ cat_2 $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
A B 2
A B 2
A B 2
A B 1
B B 2
B B 3
B A 2
B A 4
;
run;
proc sort data=y1;by cat_1 cat_2;run;
data y2;
set y1;
by cat_1 cat_2;
if first.cat_2=1 then do;
retain x 1;
x=numbers-1;
end;
x=(numbers)*x;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.