Calcite | Level 5

## Proc means

Define variables using a by statement on SAS OnDemand for academics

libname patients '/home/u63367626/PATIENTS';
data patients.patt1;
set patt2;
format dob1 date9.;

dob=compress(cat(month,'/',day,'/',year));
dob1=input(dob,mmddyy10.);

age = yrdif(dob1,'01jan2023'd)/365;
run;
output;
=2;
output;

run;

/*evaluating the statistical parameters for age*/

proc sort data=patients.patt1;
by ;
run;

proc means data=patients.patt1;
variables age;
output out= agestsatz;
by ;
run;
7 REPLIES 7
Diamond | Level 26

## Re: Proc means

--
Paige Miller
Calcite | Level 5

## Re: Proc means

What does the BY statement mean when finding a variable (age) using Proc Means, the course I’m taking stated it and I have no idea what it means
Diamond | Level 26

## Re: Proc means

The code where there is no variable name(s) following BY, thusly

by ;

does absolutely nothing.

If there were variable names after the BY, then this will have an impact, which you can read about in the documentation.

--
Paige Miller
Rhodochrosite | Level 12

## Re: Proc means

You can see some sample output using BY statement in this paper: https://www.lexjansen.com/nesug/nesug08/ff/ff06.pdf

PROC Star

## Re: Proc means

age = yrdif(dob1,'01jan2023'd)/365;

doesn't make sense.  Since the YRDIF function calculates the number of years between dob1 and jan 1, 2023, there is no reason to divide that result by 365.  What the code above produces is "number of 365-year intervals" between the dates.

It's true that you can tell yrdif what year-calibration (to support various conventions in finance) you want to use, and perhaps that's what you wanted to do (see discussion of the basis argument in YRDIF Function). But if you use this argument, then it must be the 3rd entry inside the parentheses, not outside the parentheses.  And as a character argument, it will have to be in quotes.  In your case, the likely basis would be "age", as in

age = yrdif(dob1,'01jan2023'd,'age');

And even that is unnecessary, since 'age' is the default basis.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

## Re: Proc means

@Sammy_G wrote:
Define variables using a by statement on SAS OnDemand for academics

If this is supposed to be an instruction from some sort of teacher I would find another class. The only variables "defined" with a BY statement are some automatic variables SAS creates in a data step to indicate if a particular observation is the the first or last of a given variable for a combination of variables on a BY statement. I might accept something like "Create variables to use later on a BY statement" but not as phrased.

What the BY statement does is attempt to do the same operations for each level of a given by variable or combination. Typically that requires sorting the data into that order using Proc Sort.

Consider the SASHELP.CLASS data set that you should have available as an example. Suppose that we want to get summary statistics for the student weight and height variable By sex.

Proc sort data=sashelp.class out=work.class;
by sex;
run;

proc means data=work.class n mean min max std;
by sex;
var height weight;
run;

This creates a summary for Sex=F and another for Sex=M. This what BY does, process groups of observations.

Super User

## Re: Proc means

libname patients '/home/u63367626/PATIENTS';

data patients.patt1;
set patt2;
format dob1 date9.;

*removes spaces and adds / between values. Easier to use the MDY function;
dob=compress(cat(month,'/',day,'/',year));
dob1=input(dob,mmddyy10.);

*incorrect calculation of age, should be age = yrdif(dob1, '01jan2023'd, dob1);
age = yrdif(dob1,'01jan2023'd)/365;
*ends current data step;
run;

*outside of data step - does nothing/error;
output;
*outside of data step - does nothing/error;
=2;
*outside of data step - does nothing/error;
output;
*outside of data step - does nothing/error;
run;

/*evaluating the statistical parameters for age*/

proc sort data=patients.patt1;
*outside of data step - does nothing/error;
by ;
run;

proc means data=patients.patt1;
variables age; *usually see var age not the full word variables;
output out= agestsatz;
by ; *no variable specified to break up the analysis;
run;