Hello!
I'm trying to find max glucose level per subject, but the subjects have multiple variables.
For example:
DATA homework2;
INPUT subject $ visit_date glucose;
DATALINES;
1 06/14/16 113
2 05/21/16 107
3 08/17/17 98
1 12/01/17 110
2 03/21/17 118
3 04/04/17 106
1 01/23/18 117
;
RUN;
How do I get the system to realize that the different subjects occur at different frequencies and across separate observations but I want the maximum glucose value IN A NEW VARIABLE (max_glucose) per subject?
DATA homework2;
INPUT subject $ visit_date glucose;
DATALINES;
1 06/14/16 113
2 05/21/16 107
3 08/17/17 98
1 12/01/17 110
2 03/21/17 118
3 04/04/17 106
1 01/23/18 117
;
RUN;
proc sql;
create table want as
select subject,max(glucose) as max_glucose
from homework2
group by subject;
quit;
DATA homework2;
INPUT subject $ visit_date glucose;
DATALINES;
1 06/14/16 113
2 05/21/16 107
3 08/17/17 98
1 12/01/17 110
2 03/21/17 118
3 04/04/17 106
1 01/23/18 117
;
RUN;
proc sql;
create table want as
select subject,max(glucose) as max_glucose
from homework2
group by subject;
quit;
Perfect! Thank you!
I'm pretty new to SAS (first-year grad student), so I always get excited to see a log with no error messages!
Thank you, again!
@MBlack732 Welcome to the SAS forum. I hope you enjoy your journey with us. And, all the very best for your academic en devours.
@MBlack732 wrote:
Perfect! Thank you!
I'm pretty new to SAS (first-year grad student), so I always get excited to see a log with no error messages!
Thank you, again!
And a different procedure:
proc summary data=have nway; class subject; var glucose; output out=want (drop=_:) max= /autoname ; run;
Proc summary will do a few more statistics than Proc Sql has available. Also if you have mulitiple variables to summarize you only need to add them to the VAR statement. The AUTONAME option creates variables with the statistic as a suffix to the original variable.
Another feature with class variables is the possibility of doing summaries based on different combinations of them. Without the NWAY statement you would get a summary overall across all subjects of the date. The drop statement removes a variable _type_ that would indicate which combination of the class variables a current output record would indicate.
Also with appropriate formats for the date variable you can get summaries grouped in many ways.
Example using somewhat modified data:
DATA homework2; INPUT subject $ visit_date mmddyy8. glucose; format visit_date date9.; DATALINES; 1 06/14/16 113 1 05/21/16 107 1 08/17/16 98 1 12/01/16 110 2 03/21/18 118 2 04/04/18 106 2 01/23/18 117 ; RUN; proc summary data=homework2 nway; class subject visit_date; format visit_date yyq6.; var glucose ; output out=want max= min= mean=/autoname; run;
which would provide the max, min and mean value of glucose by calendar quarter. The more variables and statistics you want Proc Summary or Means becomes much more code efficient as you don't have to explicitly create the variable names and as I said previously, there are statistics such as quartiles, percentiles, interquartile range, range that SQL does not support.
Did you try this
proc sql;
create table want as select * , max(glucose) as max_glucose
from have
group by subject;
quit;
Or with data step.
Sort by subject,glucose and take last.subject in a dataset , rename glucose to max_glucose and then merge back to original data .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.