BookmarkSubscribeRSS Feed
Sammy_G
Calcite | Level 5
Define variables using a by statement on SAS OnDemand for academics

libname patients '/home/u63367626/PATIENTS';
data patients.patt1;
set patt2;
format dob1 date9.;

dob=compress(cat(month,'/',day,'/',year));
dob1=input(dob,mmddyy10.);

age = yrdif(dob1,'01jan2023'd)/365;
run;
output;
=2;
output;

run;


/*evaluating the statistical parameters for age*/

proc sort data=patients.patt1;
by ;
run;

proc means data=patients.patt1;
variables age;
output out= agestsatz;
by ;
run;
7 REPLIES 7
PaigeMiller
Diamond | Level 26

What is your question?

--
Paige Miller
Sammy_G
Calcite | Level 5
What does the BY statement mean when finding a variable (age) using Proc Means, the course I’m taking stated it and I have no idea what it means
PaigeMiller
Diamond | Level 26

The code where there is no variable name(s) following BY, thusly

 

by ;

does absolutely nothing. 

 

If there were variable names after the BY, then this will have an impact, which you can read about in the documentation.

--
Paige Miller
tarheel13
Rhodochrosite | Level 12

You can see some sample output using BY statement in this paper: https://www.lexjansen.com/nesug/nesug08/ff/ff06.pdf 

mkeintz
PROC Star

This is not addressed to your question, but the code

 

age = yrdif(dob1,'01jan2023'd)/365;

doesn't make sense.  Since the YRDIF function calculates the number of years between dob1 and jan 1, 2023, there is no reason to divide that result by 365.  What the code above produces is "number of 365-year intervals" between the dates.

 

It's true that you can tell yrdif what year-calibration (to support various conventions in finance) you want to use, and perhaps that's what you wanted to do (see discussion of the basis argument in YRDIF Function). But if you use this argument, then it must be the 3rd entry inside the parentheses, not outside the parentheses.  And as a character argument, it will have to be in quotes.  In your case, the likely basis would be "age", as in

age = yrdif(dob1,'01jan2023'd,'age');

And even that is unnecessary, since 'age' is the default basis.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

@Sammy_G wrote:
Define variables using a by statement on SAS OnDemand for academics


If this is supposed to be an instruction from some sort of teacher I would find another class. The only variables "defined" with a BY statement are some automatic variables SAS creates in a data step to indicate if a particular observation is the the first or last of a given variable for a combination of variables on a BY statement. I might accept something like "Create variables to use later on a BY statement" but not as phrased.

 

What the BY statement does is attempt to do the same operations for each level of a given by variable or combination. Typically that requires sorting the data into that order using Proc Sort.

Consider the SASHELP.CLASS data set that you should have available as an example. Suppose that we want to get summary statistics for the student weight and height variable By sex.

Proc sort data=sashelp.class out=work.class;
   by sex;
run;

proc means data=work.class n mean min max std;
   by sex;
   var height weight;
run;

This creates a summary for Sex=F and another for Sex=M. This what BY does, process groups of observations.

 

Reeza
Super User
libname patients '/home/u63367626/PATIENTS';


data patients.patt1;
set patt2;
format dob1 date9.;

*removes spaces and adds / between values. Easier to use the MDY function;
dob=compress(cat(month,'/',day,'/',year));
dob1=input(dob,mmddyy10.);

*incorrect calculation of age, should be age = yrdif(dob1, '01jan2023'd, dob1);
age = yrdif(dob1,'01jan2023'd)/365;
*ends current data step;
run;

*outside of data step - does nothing/error;
output;
*outside of data step - does nothing/error;
=2;
*outside of data step - does nothing/error;
output;
*outside of data step - does nothing/error;
run;


/*evaluating the statistical parameters for age*/

proc sort data=patients.patt1;
*outside of data step - does nothing/error;
by ;
run;

proc means data=patients.patt1;
variables age; *usually see var age not the full word variables;
output out= agestsatz;
by ; *no variable specified to break up the analysis;
run;

See comments on your code.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1413 views
  • 3 likes
  • 6 in conversation