Dear users,
i would like to replicate the result ( column want ) of my Excel example below .
any help on it i tried many ways , maybe the code below is close to get the right results.
age | country | region | x | want |
1 | FR | Europe | 0,5 | 1 |
2 | FR | Europe | 0,6 | 0,5 |
3 | FR | Europe | 0,7 | 0,2 |
4 | FR | Europe | 0,8 | 0,06 |
5 | FR | Europe | 0,9 | 0,012 |
1 | US | America | 0,1 | 1 |
2 | US | America | 0,2 | 0,9 |
3 | US | America | 0,3 | 0,72 |
4 | US | America | 0,4 | 0,504 |
5 | US | America | 0,5 | 0,3024 |
data want;
set have;
by country region age;
if first.country then y=1;
else y=lag(y)*(1-lag(x));
run;
Why are you calling LAG(Y)? That neither makes any sense if Y is not one of your source variables nor if you want to use the retained value in the calculation.
data want;
set have;
by country region age;
y=y*(1-lag(x));
if first.country then y=1;
retain y;
run;
@bebess wrote:
...
data want; set have; by country region age; if first.country then y=1; else y=lag(y)*(1-lag(x)); run;
That is not going to work right. By not calling the LAG() function for the first value of each COUNTRY the values of X and Y from that observation never make it into the list of values it has available to return.
Do this instead so that LAG() runs on every observation.
y=lag(y)*(1-lag(x));
if first.country then y=1;
thanks for your help ,
the proposed code give me the following result
age | country | region | x | want | y |
1 | FR | Europe | 0,5 | 1 | 1 |
2 | FR | Europe | 0,6 | 0,5 | |
3 | FR | Europe | 0,7 | 0,2 | |
4 | FR | Europe | 0,8 | 0,06 | |
5 | FR | Europe | 0,9 | 0,012 | |
1 | US | America | 0,1 | 1 | 1 |
2 | US | America | 0,2 | 0,9 | |
3 | US | America | 0,3 | 0,72 | |
4 | US | America | 0,4 | 0,504 | |
5 | US | America | 0,5 | 0,3024 |
data want;
set have;
by country region age;
y=lag(y)*(1-lag(x));
if first.country then y=1;
run;
when i tried retain as below i've got wrong result comparing to column want
age | country | region | x | want | y |
1 | FR | Europe | 0,5 | 1 | 1 |
2 | FR | Europe | 0,6 | 0,5 | 0,5 |
3 | FR | Europe | 0,7 | 0,2 | 0,4 |
4 | FR | Europe | 0,8 | 0,06 | 0,15 |
5 | FR | Europe | 0,9 | 0,012 | 0,08 |
1 | US | America | 0,1 | 1 | 1 |
2 | US | America | 0,2 | 0,9 | 0,072 |
3 | US | America | 0,3 | 0,72 | 0,8 |
4 | US | America | 0,4 | 0,504 | 0,0504 |
5 | US | America | 0,5 | 0,3024 | 0,48 |
data want;
set have;
by country region age;
retain y 1;
y=lag(y)*(1-lag(x));
if first.country then y=1;
run;
Why are you calling LAG(Y)? That neither makes any sense if Y is not one of your source variables nor if you want to use the retained value in the calculation.
data want;
set have;
by country region age;
y=y*(1-lag(x));
if first.country then y=1;
retain y;
run;
data have;
input age country $ region $ x;
datalines;
1 FR Europe 0.5
2 FR Europe 0.6
3 FR Europe 0.7
4 FR Europe 0.8
5 FR Europe 0.9
1 US America 0.1
2 US America 0.2
3 US America 0.3
4 US America 0.4
5 US America 0.5
;
proc sort data=have;
by country region age;
data want (drop=prev_y prev_x);
set have;
by country region;
retain prev_x prev_y;
if first.region then y = 1;
else y = prev_y * (1 - prev_x);
prev_y = y;
prev_x = x;
run;
proc print data=want noobs;
** Results **;
age country region x y
1 FR Europe 0.5 1.0000
2 FR Europe 0.6 0.5000
3 FR Europe 0.7 0.2000
4 FR Europe 0.8 0.0600
5 FR Europe 0.9 0.0120
1 US America 0.1 1.0000
2 US America 0.2 0.9000
3 US America 0.3 0.7200
4 US America 0.4 0.5040
5 US America 0.5 0.3024
This code explicitly uses retain to keep the previous values. I find it simpler to know exactly which variables are being kept.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.