Hi,
Here's an example of my data:
start_date levelA levelB levelC levelD
15-01-2020 0 1 0 0
16-01-2020 0 0 1 0
20-02-2020 1 0 0 0
20-02-2020 0 0 1 0
25-03-2020 0 0 1 0
25-03-2020 0 0 0 1
(etc.)
Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).
Here's what I've been trying (unsuccessfully to create):
start_date n_levelA n_levelB n_levelC n_levelD high
15-01-2020 0 1 0 0 1
16-01-2020 0 0 1 0 1
20-02-2020 1 0 1 0 2
25-03-2020 0 0 1 1 2
My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.
Thank you!
@sasthalie wrote:
Hi,
Here's an example of my data:
start_date levelA levelB levelC levelD
15-01-2020 0 1 0 0
16-01-2020 0 0 1 0
20-02-2020 1 0 0 0
20-02-2020 0 0 1 0
25-03-2020 0 0 1 0
25-03-2020 0 0 0 1
(etc.)
Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).
Here's what I've been trying (unsuccessfully to create):
start_date n_levelA n_levelB n_levelC n_levelD high
15-01-2020 0 1 0 0 1
16-01-2020 0 0 1 0 1
20-02-2020 1 0 1 0 2
25-03-2020 0 0 1 1 2
My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.
Thank you!
What are the rules for assigning values to high?
A guess:
data have; input start_date :ddmmyy10. levelA levelB levelC levelD; format start_date ddmmyy10.; datalines; 15-01-2020 0 1 0 0 16-01-2020 0 0 1 0 20-02-2020 1 0 0 0 20-02-2020 0 0 1 0 25-03-2020 0 0 1 0 25-03-2020 0 0 0 1 ; proc summary data=have nway; class start_date; var level: ; output out=want(drop=_type_ rename=(_freq_=high)) max=; run;
Note the data step to provide something to code with and code placed in a text box so it is easy to copy and paste into an editor.
Proc summary code above gets the maximum value of each level variable (the : after level is a list maker for all variable names that start with level). The procedure automatically includes two variables in the output set, _type_ indicates combinations of class variables and would all have the same value as the NWAY option only outputs the highest type. Run the code without nway and remove the drop to see what _type_ looks like and the data. The second automatic variable is _freq_ which represents number of records with the class variable combination, so renaming it to "high" seems to match your unstated requirement on how to get value for "high".
@sasthalie wrote:
Hi,
Here's an example of my data:
start_date levelA levelB levelC levelD
15-01-2020 0 1 0 0
16-01-2020 0 0 1 0
20-02-2020 1 0 0 0
20-02-2020 0 0 1 0
25-03-2020 0 0 1 0
25-03-2020 0 0 0 1
(etc.)
Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).
Here's what I've been trying (unsuccessfully to create):
start_date n_levelA n_levelB n_levelC n_levelD high
15-01-2020 0 1 0 0 1
16-01-2020 0 0 1 0 1
20-02-2020 1 0 1 0 2
25-03-2020 0 0 1 1 2
My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.
Thank you!
What are the rules for assigning values to high?
A guess:
data have; input start_date :ddmmyy10. levelA levelB levelC levelD; format start_date ddmmyy10.; datalines; 15-01-2020 0 1 0 0 16-01-2020 0 0 1 0 20-02-2020 1 0 0 0 20-02-2020 0 0 1 0 25-03-2020 0 0 1 0 25-03-2020 0 0 0 1 ; proc summary data=have nway; class start_date; var level: ; output out=want(drop=_type_ rename=(_freq_=high)) max=; run;
Note the data step to provide something to code with and code placed in a text box so it is easy to copy and paste into an editor.
Proc summary code above gets the maximum value of each level variable (the : after level is a list maker for all variable names that start with level). The procedure automatically includes two variables in the output set, _type_ indicates combinations of class variables and would all have the same value as the NWAY option only outputs the highest type. Run the code without nway and remove the drop to see what _type_ looks like and the data. The second automatic variable is _freq_ which represents number of records with the class variable combination, so renaming it to "high" seems to match your unstated requirement on how to get value for "high".
This can get you started:
data have;
input start_date :ddmmyy10. levelA levelB levelC levelD;
format start_date ddmmyy10.;
patient_id = _n_;
datalines;
15-01-2020 0 1 0 0
16-01-2020 0 0 1 0
20-02-2020 1 0 0 0
20-02-2020 0 0 1 0
25-03-2020 0 0 1 0
25-03-2020 0 0 0 1
;
proc transpose data=have out=long(where=(col1 ne 0));
by patient_id start_date;
var level:;
run;
data level_data;
set long;
level = substr(compress(reverse(_name_)), 1, 1);
run;
proc sgplot data=level_data;
vbar start_date / group=level datalabel seglabel;
run;
proc freq data=level_data;
table start_date * level / nopercent norow nocol;
run;
This one works out great, thank you!
I changed the last paragraph for this, since I'm looking to make a histogram (not bar chart). It seems to also be correct.
proc sgplot data=level_data;
histogram start_date/ group=level;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.