Statistical Procedures

edwardar126 · Posted 05-26-2020 12:05 PM

Hello,
I am working with this large dataset that looks at a number of different patients receiving services dating back from five years ago. Now each patient is given a unique id number, but there are duplicates in the dataset because an individual could receive multiple different services or come in on different dates for the same service. I want to specifically focus on two distinct services and group individuals based on the number of times they received either service (i.e. Received once, twice,...,6+). Being that I would have to group probably on only id number and service type, I would like any recommendation on how to approach this. I made a small scale example of what my data set looks like. Thank you and I appreciate any help

ID Number Date Agency Service
0012 5/1/2020 Hospital A X Ray
0012 5/1/2020 Hospital A Lab Work
0012 5/5/2020 Clinic D Lab Work
0014 5/1/2020 Hospital A MRI
0014 5/1/2020 Hospital A Physical Therapy
0020 5/3/2020 Hospital B Lab Work
0021 5/1/2020 Hospital A X Ray
0021 5/1/2020 Hospital A Lab Work
0021 5/1/2020 Community Clinic Physical Therapy
0021 5/5/2020 Hospital A Lab Work
0021 5/5/2020 Hospital A Referral
0021 5/10/2020 Clinic D Lab Work

PaigeMiller · Posted 05-26-2020 12:12 PM

proc freq data=have;
     tables id*service/list;
run;

--
Paige Miller

edwardar126 · Posted 05-26-2020 12:47 PM

Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.

PaigeMiller · Posted 05-26-2020 12:59 PM

@edwardar126 wrote:
Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.

Then, you have not yet described what you do want in enough detail to help.

--
Paige Miller

edwardar126 · Posted 05-26-2020 01:07 PM

Overall what I'm thinking about doing is creating a new variable that groups individuals by the number of times they received a certain service type. So in this case count the number of patients who received a service only once, twice, …, or 6+. The Proc Freq is definitely a good start to look at possible groupings, but I think what I will need to take a step further. Thanks again for your consideration, sorry for not being clear earlier I'm new to this sub 😕

PaigeMiller · Posted 05-26-2020 01:20 PM

Then you run PROC FREQ twice. The first time you use tables id*service/out=_a_ noprint; and save the results in a SAS data set. The second time you run PROC FREQ on data set _a_ with tables count;

--
Paige Miller

ballardw · Posted 05-26-2020 12:40 PM

It helps to show what you would expect as a result from your given example data.

I can see something like using Proc Freq to count the person/service combinations sending that to a data set and then summarizing the "count" by the service type. Not terribly difficult or complex but we do need to know what you want as a final result.

edwardar126 · Posted 05-26-2020 12:58 PM

Hey, I apologize for not further clarifying from the beginning. Ultimately, I am only interested in patients receiving two specific services. So in my example, I'm really interested in individuals who received either 'lab work' or 'x ray'. What I will do then is group individuals by the number of times they received either service, in this case how many individuals 'received a service at least once', 'at least twice', … ,or '6+', so in the example above individual 0012 would be grouped as having a frequency of 3 times (received an x ray once and received lab work twice). My first step that I've done was create an IF statement to just focus on the two specific service, but now I am also considering creating a new variable. I don't think a Proc Freq will capture what I want to accomplish being that my dataset is set more than 100,000+ observations.

PGStats · Posted 05-26-2020 12:51 PM

Proc SQL gives you lots of flexibility to do this kind of grouping:

proc sql;
create table want as
select
    count(id) as count,
    *
from have
group by ID, service;
quit;

Count 	ID 	Date 	        Agency 	        Service
2 	12 	2020-05-01 	Hospital A 	Lab Work
2 	12 	2020-05-05 	Clinic D 	Lab Work
1 	12 	2020-05-01 	Hospital A 	X Ray
1 	14 	2020-05-01 	Hospital A 	MRI
1 	14 	2020-05-01 	Hospital A 	Physical Therapy
1 	20 	2020-05-03 	Hospital B 	Lab Work
3 	21 	2020-05-10 	Clinic D 	Lab Work
3 	21 	2020-05-05 	Hospital A 	Lab Work
3 	21 	2020-05-01 	Hospital A 	Lab Work
1 	21 	2020-05-01 	Community Clinic 	Physical Therapy
1 	21 	2020-05-05 	Hospital A 	Referral
1 	21 	2020-05-01 	Hospital A 	X Ray

PG

Statistical Procedures

Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Re: Grouping by frequencies with more than one variable?

Transforming the Frequency of Time Series Data

find the count of variable by group

Result by group

Wavelet Analysis using SAS/IML: Thresholding the Detail Coefficients t...

frequency count across variables

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...