Hey! I'm not the best programmer, nor I've ever post something here as a student.
I need help by a little task I've got at work. I should use the data step for creating a table "durchschnitt2" with the MIN, MAX and AVG für every numeric value for each sex (M and F) using the SASHELP.CLASS.
I already sovled the task by using the proc SQL. This is the code:
proc sql;
create table durchschnitt1 as
select
sex,
min(age) as age_min,
max(age) as age_max,
avg(age) as age_avg format=commax10.2,
min(height) as height_min format=commax10.2,
max(height) as height_max format=commax10.2,
avg(height) as height_avg format=commax10.2,
min(weight) as weight_min format=commax10.2,
max(weight) as weight_max format=commax10.2,
avg(weight) as weight_avg format=commax10.2
from sashelp.class
group by sex;
run;
How should I do this in a data step?
Thank your for helping.
Its quite the faff, but:
proc sort data=sashelp.class out=class; by sex; run; data durchschnitt2 (keep=sex age_min--height_mean); set class; by sex; array results{3,4} age_min age_max age_count age_mean weight_min weight_max weight_count weight_mean height_min height_max height_count height_mean; array vals{3} age weight height; retain age_min--height_mean; if first.sex then call missing(of results{*}); do i=1 to 3; results{i,1}=min(results{i,1},vals{i}); results{i,2}=max(results{i,2},vals{i}); results{i,3}=sum(results{i,3},1); results{i,4}=sum(results{i,4},vals{i}); end; if last.sex then do; do i=1 to 3; results{i,4}=results{i,4} / results{i,3}; end; output; end; run;
Also note the use of the code window (its the {i} above post area)
Why not use PROC MEANS like this?
proc means noprint data=sashelp.class min max mean;
output out=want(keep=_STAT_ age height weight);
var age height weight;
run;
Thaanks for this.
I haven't used it because my instructor says I must use a data step for this.
@sandi98 wrote:
Thaanks for this.
I haven't used it because my instructor says I must use a data step for this.
Well, since everyone here told you not to use a data step, but your instructor says you have to, you're probably going to have to take a try at creating the code on your own to do this in a data step. So show us your code, and we can probably (reluctantly) help.
It's not clear to me what the point of such a homework assignment is anyway ... student must learn inefficient and tedious method of computing simple statistics in SAS, that can be easily computed with 5 lines of code using the proper SAS PROC. You can tell your instructor that I said so.
It's also seems weird for myself.
I spend the last hours by desperately getting informations on google. That's all I have for a (false) data step. The MAX and AVG are missing. Even the MIN doesn't work, because it's not finished.
data durchschnitt2;
set sashelp.class;
by Sex;
retain age_min;
if first.sex then do;
age=age_min;
end;
if last.sex then output;
run;
proc print data=durchschnitt2;
run;
That's a start. Let's see where you went wrong:
data durchschnitt2;
set sashelp.class;
by sex;
retain age_min;
if first.sex
then do;
age_min = 9999; * initialize to absurdly high value;
end;
age_min = min(age_min,age); * the core of calculating the minimum;
if last.sex then output; * you got that right;
keep sex age_min; * other variables are not needed in the output;
run;
For calculating an average, you will want to retain a sum and a counter, and do the calculation in a do-end block at last.sex that also contains the output statement.
Its quite the faff, but:
proc sort data=sashelp.class out=class; by sex; run; data durchschnitt2 (keep=sex age_min--height_mean); set class; by sex; array results{3,4} age_min age_max age_count age_mean weight_min weight_max weight_count weight_mean height_min height_max height_count height_mean; array vals{3} age weight height; retain age_min--height_mean; if first.sex then call missing(of results{*}); do i=1 to 3; results{i,1}=min(results{i,1},vals{i}); results{i,2}=max(results{i,2},vals{i}); results{i,3}=sum(results{i,3},1); results{i,4}=sum(results{i,4},vals{i}); end; if last.sex then do; do i=1 to 3; results{i,4}=results{i,4} / results{i,3}; end; output; end; run;
Also note the use of the code window (its the {i} above post area)
Is this for some homework? The reason being to get summary stats you would normally use proc means or summary as that is what they are for:
proc sort data=sashelp.class out=class; by sex; run; proc means data=class min max mean; by sex; var age height weight; output out=want; run;
You can say "homework".
I was thinking to use the PROC MEAN first, but my instructur said I should do it with a data step. I don't know why...
Thanks for your help.
I would highly discourage using a DATA STEP to do this, because SAS has already created the proper PROC to create these statistics. Use SAS PROCs when available. (It will save you time, and all the debugging has been built into the creation of the PROC)
proc summary nway data=sashelp.class;
class sex;
var age height weight;
output out=stats min= max= mean= /autoname;
run;
The datastep is not the tool of choice to calculate these univariate statistics. Your SQL is quite appropriate.
The better non-SQL way would be to use proc means. This procedure can calculate your stats (and then some) and output them to a dataset with ease. I will give you the code I do not want to withhold an opportunity to read the docs and learn.
Hope this helps,
-- Jan.
Proc summary would you give you the stats as additional columns
proc summary data=sashelp.class nway;
class sex;
var age height weight;
output out=durchschnitt1(drop=_:)
min=age_min height_min weight_min
max=age_max height_max weight_max
mean=age_avg height_avg weight_avg;
run;
Hope this helps,
Ahmed
If the goal of your exercise is learning data step techniques, then you need to first sort the dataset by your group variable.
Then you use by-processing and the first. and last. automatic variables, and retained variables for counters and sums. At the end of each by group, calculate the average and output.
Do google searches for
sas proc sort
sas by statement
sas first. last.
sas retain statement
sas output statement
sas keep statement
sas keep dataset option
Thank you! I will search for this.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.