BookmarkSubscribeRSS Feed
edwardar126
Calcite | Level 5
Hello,
I am working with this large dataset that looks at a number of different patients receiving services dating back from five years ago. Now each patient is given a unique id number, but there are duplicates in the dataset because an individual could receive multiple different services or come in on different dates for the same service. I want to specifically focus on two distinct services and group individuals based on the number of times they received either service (i.e. Received once, twice,...,6+). Being that I would have to group probably on only id number and service type, I would like any recommendation on how to approach this. I made a small scale example of what my data set looks like. Thank you and I appreciate any help
 

 



ID Number Date Agency Service
0012 5/1/2020 Hospital A X Ray
0012 5/1/2020 Hospital A Lab Work
0012 5/5/2020 Clinic D Lab Work
0014 5/1/2020 Hospital A MRI
0014 5/1/2020 Hospital A Physical Therapy
0020 5/3/2020 Hospital B Lab Work
0021 5/1/2020 Hospital A X Ray
0021 5/1/2020 Hospital A Lab Work
0021 5/1/2020 Community Clinic Physical Therapy
0021 5/5/2020 Hospital A Lab Work
0021 5/5/2020 Hospital A Referral
0021 5/10/2020 Clinic D Lab Work
8 REPLIES 8
PaigeMiller
Diamond | Level 26
proc freq data=have;
     tables id*service/list;
run;
--
Paige Miller
edwardar126
Calcite | Level 5
Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.
PaigeMiller
Diamond | Level 26

@edwardar126 wrote:
Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.

Then, you have not yet described what you do want in enough detail to help.

--
Paige Miller
edwardar126
Calcite | Level 5
Overall what I'm thinking about doing is creating a new variable that groups individuals by the number of times they received a certain service type. So in this case count the number of patients who received a service only once, twice, …, or 6+. The Proc Freq is definitely a good start to look at possible groupings, but I think what I will need to take a step further. Thanks again for your consideration, sorry for not being clear earlier I'm new to this sub 😕
PaigeMiller
Diamond | Level 26

Then you run PROC FREQ twice. The first time you use tables id*service/out=_a_ noprint; and save the results in a SAS data set. The second time you run PROC FREQ on data set _a_ with tables count;

--
Paige Miller
ballardw
Super User

It helps to show what you would expect as a result from your given example data.

 

I can see something like using Proc Freq to count the person/service combinations sending that to a data set and then summarizing the "count" by the service type. Not terribly difficult or complex but we do need to know what you want as a final result.

edwardar126
Calcite | Level 5
Hey, I apologize for not further clarifying from the beginning. Ultimately, I am only interested in patients receiving two specific services. So in my example, I'm really interested in individuals who received either 'lab work' or 'x ray'. What I will do then is group individuals by the number of times they received either service, in this case how many individuals 'received a service at least once', 'at least twice', … ,or '6+', so in the example above individual 0012 would be grouped as having a frequency of 3 times (received an x ray once and received lab work twice). My first step that I've done was create an IF statement to just focus on the two specific service, but now I am also considering creating a new variable. I don't think a Proc Freq will capture what I want to accomplish being that my dataset is set more than 100,000+ observations.
PGStats
Opal | Level 21

Proc SQL gives you lots of flexibility to do this kind of grouping:

 

proc sql;
create table want as
select
    count(id) as count,
    *
from have
group by ID, service;
quit;
Count 	ID 	Date 	        Agency 	        Service
2 	12 	2020-05-01 	Hospital A 	Lab Work
2 	12 	2020-05-05 	Clinic D 	Lab Work
1 	12 	2020-05-01 	Hospital A 	X Ray
1 	14 	2020-05-01 	Hospital A 	MRI
1 	14 	2020-05-01 	Hospital A 	Physical Therapy
1 	20 	2020-05-03 	Hospital B 	Lab Work
3 	21 	2020-05-10 	Clinic D 	Lab Work
3 	21 	2020-05-05 	Hospital A 	Lab Work
3 	21 	2020-05-01 	Hospital A 	Lab Work
1 	21 	2020-05-01 	Community Clinic 	Physical Therapy
1 	21 	2020-05-05 	Hospital A 	Referral
1 	21 	2020-05-01 	Hospital A 	X Ray
PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 613 views
  • 2 likes
  • 4 in conversation