Hi,
I have a dataset that contains a variable for year (2010-2013), a variable for unitID (LT0100 to LT2400), a variable "code" ranging from 0-6, and some other variables (a_1-a_8;b_1-b_8;etc...).
At the moment, what I have is all the columns in the dataset below, except for f_7 which I want to create. f_7 should be calculated using a formula like this: " f_5*(mean(f_6) where Cod=0) ". In words, I want to multiply f_5 by the single value of f_6 (only 1 value, where code=0, for each year and unitId).
The code below, and variations of it, is my first attempts. But it fails to either "lock" the multiplication to each single value, or to find values in column f_6.
proc sql;
create table Dataset1 as
select *, f_5*(mean(f_6)) as f_7
from Dataset
group by unitID ,year
order by year,unitID ,code
;
quit;
| UnitId | Year | Code | a_1 | a_2 | a_3 | f_5 | f_6 | f_7 |
| LT0100 | 2010 | 0 | 2415 | |||||
| LT0100 | 2010 | 1 | 2959139 | 320212 | 1600244 | 0,137518 | 332,106 | |
| LT0100 | 2010 | 2 | 4565010 | 142715 | 996557 | 0,06129 | 148,0154 | |
| LT0100 | 2010 | 3 | 3301774 | 401540 | 2041513 | 0,172445 | 416,4547 | |
| LT0100 | 2010 | 4 | 7384528 | 1163174 | 5855854 | 0,499536 | 1206,379 | |
| LT0100 | 2010 | 5 | 1357620 | 33371 | 1125468 | 0,014332 | 34,61178 | |
| LT0100 | 2010 | 6 | 1454282 | 267495 | 1110066 | 0,114878 | 277,4304 | |
| LT0300 | 2010 | 0 | 1684 | |||||
| LT0300 | 2010 | 1 | 649706 | 171904 | 326910 | 0,15413 | 259,5549 | |
| LT0300 | 2010 | 2 | 1058760 | 97291 | 167824 | 0,087232 | 146,8987 | |
| LT0300 | 2010 | 3 | 1156493 | 348738 | 733397 | 0,31268 | 526,5531 | |
| LT0300 | 2010 | 4 | 2047786 | 415344 | 1444201 | 0,372399 | 627,1199 | |
| LT0300 | 2010 | 5 | 257839 | 76602 | 174777 | 0,068682 | 115,6605 | |
| LT0300 | 2010 | 6 | 248166 | 5440 | 241026 | 0,004878 | 8,214552 | |
| LT0400 | 2010 | 0 | 3104 | |||||
| LT0400 | 2010 | 1 | 478329 | 37400 | 224111 | 0,091693 | 284,6151 | |
| LT0400 | 2010 | 2 | 745501 | 16167 | 65280 | 0,039637 | 123,0332 | |
| LT0400 | 2010 | 3 | 705415 | 133909 | 542181 | 0,328304 | 1019,056 | |
| LT0400 | 2010 | 4 | 901986 | 182750 | 712861 | 0,448047 | 1390,738 | |
| LT0400 | 2010 | 5 | 175015 | 37655 | 112965 | 0,092319 | 286,5582 | |
| LT0400 | 2010 | 6 | 185470 | 185470 | ||||
| LT0100 | 2011 | 0 | 1235 | |||||
| LT0100 | 2011 | 1 | 2959139 | 352233,2 | 1600244 | 0,137518 | 169,8347 | |
| LT0100 | 2011 | 2 | 4565010 | 156986,5 | 996557 | 0,06129 | 75,69315 | |
| LT0100 | 2011 | 3 | 3301774 | 441694 | 2041513 | 0,172445 | 212,9696 | |
| LT0100 | 2011 | 4 | 7384528 | 1279491 | 5855854 | 0,499536 | 616,927 | |
| LT0100 | 2011 | 5 | 1357620 | 36708,1 | 1125468 | 0,014332 | 17,70002 | |
| LT0100 | 2011 | 6 | 1454282 | 294244,5 | 1110066 | 0,114878 | 141,8743 | |
| LT0300 | 2011 | 0 | 2214 | |||||
| LT0300 | 2011 | 1 | 649706 | 189094,4 | 326910 | 0,15413 | 341,2438 | |
| LT0300 | 2011 | 2 | 1058760 | 107020,1 | 167824 | 0,087232 | 193,1316 | |
| LT0300 | 2011 | 3 | 1156493 | 383611,8 | 733397 | 0,31268 | 692,2735 | |
| LT0300 | 2011 | 4 | 2047786 | 456878,4 | 1444201 | 0,372399 | 824,4914 | |
| LT0300 | 2011 | 5 | 257839 | 84262,2 | 174777 | 0,068682 | 152,0619 | |
| LT0300 | 2011 | 6 | 248166 | 5984 | 241026 | 0,004878 | 10,79989 | |
| LT0400 | 2011 | 0 | 2610 | |||||
| LT0400 | 2011 | 1 | 478329 | 41140 | 224111 | 0,091693 | 239,3187 | |
| LT0400 | 2011 | 2 | 745501 | 17783,7 | 65280 | 0,039637 | 103,4526 | |
| LT0400 | 2011 | 3 | 705415 | 147299,9 | 542181 | 0,328304 | 856,8734 | |
| LT0400 | 2011 | 4 | 901986 | 201025 | 712861 | 0,448047 | 1169,403 | |
| LT0400 | 2011 | 5 | 175015 | 41420,5 | 112965 | 0,092319 | 240,9526 | |
| LT0400 | 2011 | 6 | 185470 | 185470 |
I can´t find any sources on this problem online, and maybe I need to structure the variable f_6 differently but I don´t know exactly how to, in order to get the multiplication done correctly.
Any suggestions on how to solve this problem?
Best regards,
Hank
A data step is easier, use retain to hold the multiplier and make sure to set the value when code=0. Then drop the multiplier column at the end, I've left it in for display purposes. This assumes your data is sorted as above.
data want;
set have;
by unitid year;
retain multiplier;
if code=0 then multiplier=f_6;
f_7=f_5*multiplier;
*drop mulitplier;
run;
A data step is easier, use retain to hold the multiplier and make sure to set the value when code=0. Then drop the multiplier column at the end, I've left it in for display purposes. This assumes your data is sorted as above.
data want;
set have;
by unitid year;
retain multiplier;
if code=0 then multiplier=f_6;
f_7=f_5*multiplier;
*drop mulitplier;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.