BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sandi98
Fluorite | Level 6

Hey! I'm not the best programmer, nor I've ever post something here as a student.

 

I need help by a little task I've got at work. I should use the data step for creating a table "durchschnitt2" with the MIN, MAX and AVG für every numeric value for each sex (M and F) using the SASHELP.CLASS.

 

I already sovled the task by using the proc SQL. This is the code:

 

proc sql;

create table durchschnitt1 as

select

sex,

min(age) as age_min,

max(age) as age_max,

avg(age) as age_avg format=commax10.2,

min(height) as height_min format=commax10.2,

max(height) as height_max format=commax10.2,

avg(height) as height_avg format=commax10.2,

min(weight) as weight_min format=commax10.2,

max(weight) as weight_max format=commax10.2,

avg(weight) as weight_avg format=commax10.2

from sashelp.class

group by sex;

run;

 

How should I do this in a data step?

Thank your for helping.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its quite the faff, but:

proc sort data=sashelp.class out=class;
  by sex;
run;

data durchschnitt2 (keep=sex age_min--height_mean);
  set class;
  by sex;
  array results{3,4} age_min age_max age_count age_mean 
                     weight_min weight_max weight_count weight_mean
                     height_min height_max height_count height_mean;
  array vals{3} age weight height;
  retain age_min--height_mean;
  if first.sex then call missing(of results{*});
  do i=1 to 3;
    results{i,1}=min(results{i,1},vals{i});
    results{i,2}=max(results{i,2},vals{i});
    results{i,3}=sum(results{i,3},1);
    results{i,4}=sum(results{i,4},vals{i});
  end;
  if last.sex then do;
    do i=1 to 3;
      results{i,4}=results{i,4} / results{i,3};
    end;
    output;
  end;
run;

Also note the use of the code window (its the {i} above post area)

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

Why not use PROC MEANS like this?

 

proc means noprint data=sashelp.class min max mean;
	output out=want(keep=_STAT_ age height weight);
	var age height weight;
run;
sandi98
Fluorite | Level 6

Thaanks for this.

 

I haven't used it because my instructor says I must use a data step for this.

PaigeMiller
Diamond | Level 26

@sandi98 wrote:

Thaanks for this.

 

I haven't used it because my instructor says I must use a data step for this.


Well, since everyone here told you not to use a data step, but your instructor says you have to, you're probably going to have to take a try at creating the code on your own to do this in a data step. So show us your code, and we can probably (reluctantly) help.

 

It's not clear to me what the point of such a homework assignment is anyway ... student must learn inefficient and tedious method of computing simple statistics in SAS, that can be easily computed with 5 lines of code using the proper SAS PROC. You can tell your instructor that I said so.

--
Paige Miller
sandi98
Fluorite | Level 6

It's also seems weird for myself.

 

I spend the last hours by desperately getting informations on google. That's all I have for a (false) data step. The MAX and AVG are missing. Even the MIN doesn't work, because it's not finished.

 

data durchschnitt2;
set sashelp.class;
by Sex;

retain age_min;
if first.sex then do;
age=age_min;
end;
if last.sex then output;

run;

 

proc print data=durchschnitt2;
run;

 

 

Kurt_Bremser
Super User

That's a start. Let's see where you went wrong:

data durchschnitt2;
set sashelp.class;
by sex;
retain age_min;
if first.sex
then do;
  age_min = 9999; * initialize to absurdly high value;
end;
age_min = min(age_min,age); * the core of calculating the minimum;
if last.sex then output; * you got that right;
keep sex age_min; * other variables are not needed in the output;
run;

For calculating an average, you will want to retain a sum and a counter, and do the calculation in a do-end block at last.sex that also contains the output statement.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its quite the faff, but:

proc sort data=sashelp.class out=class;
  by sex;
run;

data durchschnitt2 (keep=sex age_min--height_mean);
  set class;
  by sex;
  array results{3,4} age_min age_max age_count age_mean 
                     weight_min weight_max weight_count weight_mean
                     height_min height_max height_count height_mean;
  array vals{3} age weight height;
  retain age_min--height_mean;
  if first.sex then call missing(of results{*});
  do i=1 to 3;
    results{i,1}=min(results{i,1},vals{i});
    results{i,2}=max(results{i,2},vals{i});
    results{i,3}=sum(results{i,3},1);
    results{i,4}=sum(results{i,4},vals{i});
  end;
  if last.sex then do;
    do i=1 to 3;
      results{i,4}=results{i,4} / results{i,3};
    end;
    output;
  end;
run;

Also note the use of the code window (its the {i} above post area)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Is this for some homework?  The reason being to get summary stats you would normally use proc means or summary as that is what they are for:

proc sort data=sashelp.class out=class;
  by sex;
run;

proc means data=class min max mean;
  by sex;
  var age height weight;
  output out=want;
run;
sandi98
Fluorite | Level 6

You can say "homework".

 

I was thinking to use the PROC MEAN first, but my instructur said I should do it with a data step. I don't know why...

 

Thanks for your help.

PaigeMiller
Diamond | Level 26

I would highly discourage using a DATA STEP to do this, because SAS has already created the proper PROC to create these statistics. Use SAS PROCs when available. (It will save you time, and all the debugging has been built into the creation of the PROC)

 

proc summary nway data=sashelp.class;
    class sex;
	var age height weight;
	output out=stats min= max= mean= /autoname;
run;
--
Paige Miller
jklaverstijn
Rhodochrosite | Level 12

The datastep is not the tool of choice to calculate these univariate statistics. Your SQL is quite appropriate.

 

The better non-SQL way would be to use proc means. This procedure can calculate your stats (and then some) and output them to a dataset with ease. I will give you the code I do not want to withhold an opportunity to read the docs and learn.

 

Hope this helps,

-- Jan.

AhmedAl_Attar
Rhodochrosite | Level 12

@sandi98

 

Proc summary would you give you the stats as additional columns

proc summary data=sashelp.class nway;
	class sex;
	var age height weight;
	output out=durchschnitt1(drop=_:) 
		min=age_min height_min weight_min 
		max=age_max height_max weight_max
		mean=age_avg height_avg weight_avg;
run;

Hope this helps,

Ahmed

Kurt_Bremser
Super User

If the goal of your exercise is learning data step techniques, then you need to first sort the dataset by your group variable.

Then you use by-processing and the first. and last. automatic variables, and retained variables for counters and sums. At the end of each by group, calculate the average and output.

Do google searches for

sas proc sort

sas by statement

sas first. last.

sas retain statement

sas output statement

sas keep statement

sas keep dataset option

sandi98
Fluorite | Level 6

Thank you! I will search for this.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 19705 views
  • 8 likes
  • 7 in conversation