Solved: counting observations by date

sasthalie · Posted 09-03-2021 06:39 PM

Hi,

Here's an example of my data:

start_date levelA levelB levelC levelD

15-01-2020 0 1 0 0

16-01-2020 0 0 1 0

20-02-2020 1 0 0 0

20-02-2020 0 0 1 0

25-03-2020 0 0 1 0

25-03-2020 0 0 0 1

(etc.)

Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).

Here's what I've been trying (unsuccessfully to create):

start_date n_levelA n_levelB n_levelC n_levelD high

15-01-2020 0 1 0 0 1

16-01-2020 0 0 1 0 1

20-02-2020 1 0 1 0 2

25-03-2020 0 0 1 1 2

My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.

Thank you!

ballardw · Posted 09-03-2021 07:16 PM

@sasthalie wrote:

Hi,

Here's an example of my data:

start_date levelA levelB levelC levelD

15-01-2020 0 1 0 0

16-01-2020 0 0 1 0

20-02-2020 1 0 0 0

20-02-2020 0 0 1 0

25-03-2020 0 0 1 0

25-03-2020 0 0 0 1

(etc.)

Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).

Here's what I've been trying (unsuccessfully to create):

start_date n_levelA n_levelB n_levelC n_levelD high

15-01-2020 0 1 0 0 1

16-01-2020 0 0 1 0 1

20-02-2020 1 0 1 0 2

25-03-2020 0 0 1 1 2

My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.

Thank you!

What are the rules for assigning values to high?

A guess:

data have;
   input start_date :ddmmyy10. levelA levelB levelC levelD;
   format start_date ddmmyy10.;
datalines;
15-01-2020 0  1  0  0
16-01-2020 0  0  1  0
20-02-2020 1  0  0  0
20-02-2020 0  0  1  0
25-03-2020 0  0  1  0
25-03-2020 0  0  0  1
;

proc summary data=have nway;
   class start_date;
   var level: ;
   output out=want(drop=_type_ rename=(_freq_=high)) max=;
run;

Note the data step to provide something to code with and code placed in a text box so it is easy to copy and paste into an editor.

Proc summary code above gets the maximum value of each level variable (the : after level is a list maker for all variable names that start with level). The procedure automatically includes two variables in the output set, _type_ indicates combinations of class variables and would all have the same value as the NWAY option only outputs the highest type. Run the code without nway and remove the drop to see what _type_ looks like and the data. The second automatic variable is _freq_ which represents number of records with the class variable combination, so renaming it to "high" seems to match your unstated requirement on how to get value for "high".

View solution in original post

ballardw · Posted 09-03-2021 07:16 PM

@sasthalie wrote:

Hi,

Here's an example of my data:

start_date levelA levelB levelC levelD

15-01-2020 0 1 0 0

16-01-2020 0 0 1 0

20-02-2020 1 0 0 0

20-02-2020 0 0 1 0

25-03-2020 0 0 1 0

25-03-2020 0 0 0 1

(etc.)

Each line is a patient and each patient can either be A, B, C or D. Multiple patients have the same start_date (numeric).

Here's what I've been trying (unsuccessfully to create):

start_date n_levelA n_levelB n_levelC n_levelD high

15-01-2020 0 1 0 0 1

16-01-2020 0 0 1 0 1

20-02-2020 1 0 1 0 2

25-03-2020 0 0 1 1 2

My end goal is to have this data in wide format in order to create a stacked histogram (x=start_date, y=number of patient, with stacks for each level A to D), which requires me to find out the highest and lowest number of observations for each date.

Thank you!

What are the rules for assigning values to high?

A guess:

data have;
   input start_date :ddmmyy10. levelA levelB levelC levelD;
   format start_date ddmmyy10.;
datalines;
15-01-2020 0  1  0  0
16-01-2020 0  0  1  0
20-02-2020 1  0  0  0
20-02-2020 0  0  1  0
25-03-2020 0  0  1  0
25-03-2020 0  0  0  1
;

proc summary data=have nway;
   class start_date;
   var level: ;
   output out=want(drop=_type_ rename=(_freq_=high)) max=;
run;

Note the data step to provide something to code with and code placed in a text box so it is easy to copy and paste into an editor.

Proc summary code above gets the maximum value of each level variable (the : after level is a list maker for all variable names that start with level). The procedure automatically includes two variables in the output set, _type_ indicates combinations of class variables and would all have the same value as the NWAY option only outputs the highest type. Run the code without nway and remove the drop to see what _type_ looks like and the data. The second automatic variable is _freq_ which represents number of records with the class variable combination, so renaming it to "high" seems to match your unstated requirement on how to get value for "high".

Reeza · Posted 09-03-2021 07:41 PM

This can get you started:

data have;
   input start_date :ddmmyy10. levelA levelB levelC levelD;
   format start_date ddmmyy10.;
   patient_id = _n_;
datalines;
15-01-2020 0  1  0  0
16-01-2020 0  0  1  0
20-02-2020 1  0  0  0
20-02-2020 0  0  1  0
25-03-2020 0  0  1  0
25-03-2020 0  0  0  1
;

proc transpose data=have out=long(where=(col1 ne 0));
by patient_id start_date;
var level:;
run;

data level_data;
set long;
level = substr(compress(reverse(_name_)), 1, 1);
run;

proc sgplot data=level_data;
vbar start_date / group=level datalabel seglabel;
run;

proc freq data=level_data;
table start_date * level / nopercent norow nocol;
run;

sasthalie · Posted 09-06-2021 11:57 AM

This one works out great, thank you!

I changed the last paragraph for this, since I'm looking to make a histogram (not bar chart). It seems to also be correct.

proc sgplot data=level_data;
histogram start_date/ group=level;
run;

counting observations by date

Re: counting observations by date

Re: counting observations by date

Re: counting observations by date

Re: counting observations by date

counting observations by date

Re: counting observations by date

Re: counting observations by date

Re: counting observations by date

Re: counting observations by date

Ready to join fellow brilliant minds for the SAS Hackathon?