Hello,
I am using a dataset with 12 numeric variables. I need to use a total of one variable, x12, to calculate every observation's percentage of the whole, and I would like to do this within the DATA step.
DATA example;
INPUT x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12;
x13 = (x12 / total of x12) * 100;
DATALINES;
... ... ... ... ... ... .. .. .. 200
... ... .. .. .. .. .. .. .. .. 300
;
PROC PRINT DATA = example;
SUM(x12)
SUM(x13)
RUN;Now, I need to obtain the sum of all x12 values and use it in the calculation of a new variable, x13 which is a relative percentage. The formula to calculate x13 is as follows:
x13 = (current x12 value / total sum of all x12 values) * 100
Can anyone help me with obtaining the total sum of all x12 values in the DATA step?
DATA example;
INPUT x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12;
array x x1-x12;
array pct pct1-pct12;
rowsum=sum(of x1-x12);
do i=1 to 12;
pct(i) = (x(i) / rowsum) * 100;
end;
datalines;
;
run;
Hello @PaigeMiller ,
Thank you for your swift response. I think you misunderstood my question. I need to obtain the sum of every value in the column x12, not each row.
SQL is faster.
proc sql;
create table want as
select *, sum(x12) as total, x12 / calculated total as x13
from example;
quit;
Otherwise, you need to:
If you really want a data step, see this example on how to merge in averages (SUM) would be the total and you can modify it as needed.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
@JethrowCatch wrote:
Hello,
I am using a dataset with 12 numeric variables. I need to use a total of one variable, x12, to calculate every observation's percentage of the whole, and I would like to do this within the DATA step.
DATA example; INPUT x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12; x13 = (x12 / total of x12) * 100; DATALINES; ... ... ... ... ... ... .. .. .. 200 ... ... .. .. .. .. .. .. .. .. 300 ; PROC PRINT DATA = example; SUM(x12) SUM(x13) RUN;Now, I need to obtain the sum of all x12 values and use it in the calculation of a new variable, x13 which is a relative percentage. The formula to calculate x13 is as follows:
x13 = (current x12 value / total sum of all x12 values) * 100
Can anyone help me with obtaining the total sum of all x12 values in the DATA step?
Hello @Reeza ,
Thank you for your response. We have not learned about PROC SQL in school yet. I do not want to seem as if I am cheating by using functions I haven't learned yet.
I have given your 3 step method a go, but I am having some trouble understanding how to obtain the total value before calculating the percentage.
DATA example;
INPUT x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12;
RETAIN TOTAL 0;
TOTAL = sum(TOTAL, x12);
DATALINES;
.....
;
DATA PCT;
MERGE example; BY x1;
x13 = (x12/TOTAL) * 100;
RUN;
PROC PRINT DATA = PCT;
RUN:My problem is that the value of TOTAL increases for each observation. I need for TOTAL to be a variable that has one value, which is the value it takes on after adding the x12 value of the last observation to itself.
In other words, what I would like to do is to use a specific index of TOTAL in every x13 calculation. However, indexing with curly or straight brackets does not work.
Essentially, the x13 calculation should look as follows:
x13 = ( x12 / TOTAL[30]) * 100;
30 is my number of observations in the example dataset.
Would anyone happen to know how to index observations in SAS?
Yes, I have. I now obtain the total that I need in the variable TOTALCT. However, the output that I get under the OUTPUT DATA tab is a table showing type, frequency, and total.
The output that I need is a table showing all the original variables x1-x12, with x13 added on. Therefore, I need to use PROC PRINT, but at this moment, nothing shows up under the results tab.
PROC MEANS DATA = EXAMPLE NOPRINT;
OUTPUT OUT = example SUM(x12) = TOTALCT;
run;
DATA PCT;
MERGE example; BY x1;
x13 = (x12 / TOTALCT) * 100;
RUN;
PROC PRINT DATA = PCT;
RUN;
Your merge is not correct. Please review the example to see my use of the double SET statement instead.
FYI - The NOPRINT option is specified on the PROC MEANS which suppresses any results. If you'd like to see them, remove the NOPRINT option.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.