Hello,
I have dataset example illustrated below:
row# | YEAR | SEX | CITY | STATISTICS | VALUE |
1 | 2021 | . | . | 1 | 500 |
2 | 2021 | 1 | 1 | 1 | 0 |
3 | 2021 | 2 | 1 | 1 | 499 |
4 | 2021 | . | 1 | 2 | 750 |
5 | 2021 | 1 | 1 | 2 | 0 |
6 | 2021 | 2 | 1 | 2 | 750 |
7 | 2021 | . | 2 | 1 | 300 |
8 | 2021 | 1 | 2 | 1 | 150 |
9 | 2021 | 2 | 2 | 1 | 150 |
10 | 2021 | . | 2 | 2 | 550 |
11 | 2021 | 1 | 2 | 2 | 549 |
12 | 2021 | 2 | 2 | 2 | 0 |
'.' refers to the total
I would like to replace the value from row 3 to match with the row 1 value (500) in order to keep consistency on the total. Also, I would like to replace the value row 11 to match it with the row 10 value (550). Is there any code rather than inputting manually to replace values in this situation?
Thank you,
Hello @tmdgus ,
Since you have told me (in a private mail) that one of the 2 sexes always has a value of 0 if the sexes don't add up to the total, it's even easier.
Here's the code:
(the code can be more compact and elegant than below but I didn't want to make it too complicated)
PROC DATASETS library=WORK NoList;
delete have: want: / memtype=DATA; run;
QUIT;
data have(drop=CITY STATISTICS);
LENGTH rownr YEAR 8
SEX CITY STATISTICS $ 1 citystat $ 3
VALUE 8;
input rownr YEAR SEX $ CITY $ STATISTICS $ VALUE;
citystat=CITY !! '|' !! STATISTICS;
if missing(sex) then sex='0';
cards;
1 2021 . 1 1 500
2 2021 1 1 1 0
3 2021 2 1 1 499
4 2021 . 1 2 750
5 2021 1 1 2 0
6 2021 2 1 2 750
7 2021 . 2 1 300
8 2021 1 2 1 150
9 2021 2 2 1 150
10 2021 . 2 2 550
11 2021 1 2 2 549
12 2021 2 2 2 0
;
run;
PROC TRANSPOSE data=have
out=have_trp1(drop=_NAME_)
prefix=sex_;
by YEAR citystat;
id sex;
var value;
run;
data have_trp2(drop=deviance);
set have_trp1;
deviance=(sex_0 - SUM(sex_1,sex_2));
if deviance ^= 0 then do;
if sex_1=0 then sex_2=(sex_2 + deviance);
else if sex_2=0 then sex_1=(sex_1 + deviance);
else;
end;
run;
data want(drop=i sex_0 sex_1 sex_2);
LENGTH YEAR 8
SEX $ 1 citystat $ 3
VALUE 8;
set have_trp2;
array sexarray{3} sex_0 sex_1 sex_2;
do i=1 to dim(sexarray);
sex=compress(vname(sexarray(i)),'sex_');
if sex='0' then sex='.';
value=sexarray(i);
output;
end;
run;
/* end of program */
Koen
Hello,
See below for two possible solutions.
These are, of course, 2 hard-coded solutions!!
Ideally, you would programmatically look at what the discrepancy is with the total and then proportionally increase (or decrease) the women and men until the total for both is reached. ****That is also easy to do.****
data have;
input rownr YEAR SEX CITY STATISTICS VALUE;
cards;
1 2021 . . 1 500
2 2021 1 1 1 0
3 2021 2 1 1 499
4 2021 . 1 2 750
5 2021 1 1 2 0
6 2021 2 1 2 750
7 2021 . 2 1 300
8 2021 1 2 1 150
9 2021 2 2 1 150
10 2021 . 2 2 550
11 2021 1 2 2 549
12 2021 2 2 2 0
;
run;
data want1;
set have;
by rownr year sex;
if _N_=3 then value=500;
else if _N_=11 then value=550;
else;
run;
data want2(drop=valueretain);
set have;
by rownr year sex;
retain valueretain .;
if _N_=1 then valueretain=value;
else if _N_=3 then do; value=valueretain; valueretain=.; end;
else if _N_=10 then valueretain=value;
else if _N_=11 then do; value=valueretain; valueretain=.; end;
else;
run;
proc compare data=want1 compare=want2;
run;
/* end of program */
Cheers,
Koen
Thanks for the code. Would you share the sample of code for the last sentence you mentioned?
(Ideally, you would programmatically look at what the discrepancy is with the total and then proportionally increase (or decrease) the women and men until the total for both is reached. ****That is also easy to do.****)
Thank you,
Hello @tmdgus ,
Since you have told me (in a private mail) that one of the 2 sexes always has a value of 0 if the sexes don't add up to the total, it's even easier.
Here's the code:
(the code can be more compact and elegant than below but I didn't want to make it too complicated)
PROC DATASETS library=WORK NoList;
delete have: want: / memtype=DATA; run;
QUIT;
data have(drop=CITY STATISTICS);
LENGTH rownr YEAR 8
SEX CITY STATISTICS $ 1 citystat $ 3
VALUE 8;
input rownr YEAR SEX $ CITY $ STATISTICS $ VALUE;
citystat=CITY !! '|' !! STATISTICS;
if missing(sex) then sex='0';
cards;
1 2021 . 1 1 500
2 2021 1 1 1 0
3 2021 2 1 1 499
4 2021 . 1 2 750
5 2021 1 1 2 0
6 2021 2 1 2 750
7 2021 . 2 1 300
8 2021 1 2 1 150
9 2021 2 2 1 150
10 2021 . 2 2 550
11 2021 1 2 2 549
12 2021 2 2 2 0
;
run;
PROC TRANSPOSE data=have
out=have_trp1(drop=_NAME_)
prefix=sex_;
by YEAR citystat;
id sex;
var value;
run;
data have_trp2(drop=deviance);
set have_trp1;
deviance=(sex_0 - SUM(sex_1,sex_2));
if deviance ^= 0 then do;
if sex_1=0 then sex_2=(sex_2 + deviance);
else if sex_2=0 then sex_1=(sex_1 + deviance);
else;
end;
run;
data want(drop=i sex_0 sex_1 sex_2);
LENGTH YEAR 8
SEX $ 1 citystat $ 3
VALUE 8;
set have_trp2;
array sexarray{3} sex_0 sex_1 sex_2;
do i=1 to dim(sexarray);
sex=compress(vname(sexarray(i)),'sex_');
if sex='0' then sex='.';
value=sexarray(i);
output;
end;
run;
/* end of program */
Koen
I don't fully understand this. Why those exact observations?
Hello @PeterClemmensen ,
I see it like this.
For every CITY STATISTICS combination, there's a total (first row with sex=.) and a sex=1 value and a sex=2 value. The sex=1 value + sex=2 value should equal the total.
I think the CITY value for row 1 should be 1 instead of . (missing).
The original questioner can contradict me of course. Do not take my word for it in this matter. 😏😏
Cheers,
Koen
@sbxkoenk Yes, you are correct. The city for the first row should be 1.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.