Hello! I'm trying to create a text file with this data set, that I have sorted by subject and then in descending order by glucose level. I will attach the data set now...
subject female glucose Visit_Date Age
1 | 0 | 142 | 14860 | 59.9754 |
1 | 0 | 139 | 14579 | 59.2060 |
1 | 0 | 136 | 14662 | 59.4333 |
1 | 0 | 134 | 14765 | 59.7153 |
4 | 1 | 101 | 14478 | 38.3025 |
4 | 1 | 100 | 14392 | 38.0671 |
8 | 1 | 124 | 14129 | 48.4271 |
8 | 1 | 115 | 14222 | 48.6817 |
11 | 1 | 94 | 14006 | 31.5127 |
11 | 1 | 89 | 14082 | 31.7207 |
14 | 1 | 135 | 14351 | 60.6899 |
14 | 1 | 128 | 14632 | 61.4593 |
14 | 1 | 127 | 14536 | 61.1964 |
14 | 1 | 126 | 14438 | 60.9281 |
15 | 1 | 99 | 14283 | 44.3997 |
15 | 1 | 95 | 14183 | 44.1259 |
15 | 1 | 91 | 14387 | 44.6845 |
16 | 1 | 70 | 13971 | 28.2409 |
20 | 0 | 161 | 14265 | 62.3107 |
20 | 0 | 150 | 14350 | 62.5435 |
My code so far is this...
LIBNAME BIST0535 "C:\Users\Paula\Desktop\BIST0535";
DATA BIST0535.hw2;
SET BIST0535.hw2_data(RENAME= (dov=Visit_Date));
Age= (Visit_Date-dob)/365.25;
IF state NE 4 then delete;
RUN;
PROC SORT;
BY subject DESCENDING glucose;
RUN;
PROC PRINT;
VAR subject female glucose visit_date age;
RUN;
Now I'm trying to output this data set into a text file, but I only want one observation to appear per subject and I want only the maximum glucose level (the first observation per subject) to appear. Essentially, the assignment is to only show the observation with the maximum glucose level per subject. How do I get it to look more like this....
OBS SUBJECT FEMALE MAX_GLUCOSE VISIT_DATE AGE
1 1 0 142 7 September 2000 59.98
2 4 1 101 22 August 1999 38.30
3 8 1 124 7 September 1998 48.43
etc...
Proper way to provide data is with data step and best is to paste into a code box opened using the forum's {I} menu icon.
data have; input subject female glucose Visit_Date Age; datalines; 1 0 142 14860 59.9754 1 0 139 14579 59.2060 1 0 136 14662 59.4333 1 0 134 14765 59.7153 4 1 101 14478 38.3025 4 1 100 14392 38.0671 8 1 124 14129 48.4271 8 1 115 14222 48.6817 11 1 94 14006 31.5127 11 1 89 14082 31.7207 14 1 135 14351 60.6899 14 1 128 14632 61.4593 14 1 127 14536 61.1964 14 1 126 14438 60.9281 15 1 99 14283 44.3997 15 1 95 14183 44.1259 15 1 91 14387 44.6845 16 1 70 13971 28.2409 20 0 161 14265 62.3107 20 0 150 14350 62.5435 ; run;
If done correctly then we have data to run code against and ensures that we know variable types. Also provide formats if needed such as for the Visit_date.
Do you really want decimal values as part of your "age" variable? And you reference variables in this code that do not exist in your example: (DOB and STATE)
DATA BIST0535.hw2; SET BIST0535.hw2_data(RENAME= (dov=Visit_Date)); Age= (Visit_Date-dob)/365.25; IF state NE 4 then delete; RUN;
Another way:
proc summary data=have nway; class subject female ; id visit_date age; var glucose; output out=want (drop=_type_ _freq_) max=max_glucose maxid(glucose(visit_date) glucose(age))= visit_date age; run;
All the parentheses with the maxid also allows creating values of visit_date and age for the minimum value of glucose if requested
using min= and minid to Id the values of visit_date and age when the value was a minimum.
PROC SORT data = BIST0535.hw2 ;
BY subject DESCENDING glucose;
RUN;
PROC SORT data = BIST0535.hw2
out = hw2 (rename = (glucose = glucose_max))
nodupkey;
BY subject;
RUN;
proc export data = hw2
dbms = CSV
outfile = 'C:\MyFolder\hw2.csv' replace;
run;
this only changes the variable name for me, it doesn't actually get rid of the other observations
but thanks!
Did you try it? Did you notice the NODUPKEY option that will keep only the first row of each subject - the row with the highest glucose.
Proper way to provide data is with data step and best is to paste into a code box opened using the forum's {I} menu icon.
data have; input subject female glucose Visit_Date Age; datalines; 1 0 142 14860 59.9754 1 0 139 14579 59.2060 1 0 136 14662 59.4333 1 0 134 14765 59.7153 4 1 101 14478 38.3025 4 1 100 14392 38.0671 8 1 124 14129 48.4271 8 1 115 14222 48.6817 11 1 94 14006 31.5127 11 1 89 14082 31.7207 14 1 135 14351 60.6899 14 1 128 14632 61.4593 14 1 127 14536 61.1964 14 1 126 14438 60.9281 15 1 99 14283 44.3997 15 1 95 14183 44.1259 15 1 91 14387 44.6845 16 1 70 13971 28.2409 20 0 161 14265 62.3107 20 0 150 14350 62.5435 ; run;
If done correctly then we have data to run code against and ensures that we know variable types. Also provide formats if needed such as for the Visit_date.
Do you really want decimal values as part of your "age" variable? And you reference variables in this code that do not exist in your example: (DOB and STATE)
DATA BIST0535.hw2; SET BIST0535.hw2_data(RENAME= (dov=Visit_Date)); Age= (Visit_Date-dob)/365.25; IF state NE 4 then delete; RUN;
Another way:
proc summary data=have nway; class subject female ; id visit_date age; var glucose; output out=want (drop=_type_ _freq_) max=max_glucose maxid(glucose(visit_date) glucose(age))= visit_date age; run;
All the parentheses with the maxid also allows creating values of visit_date and age for the minimum value of glucose if requested
using min= and minid to Id the values of visit_date and age when the value was a minimum.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.