- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 05-26-2020 12:05 PM
(1358 views)
Hello,
I am working with this large dataset that looks at a number of different patients receiving services dating back from five years ago. Now each patient is given a unique id number, but there are duplicates in the dataset because an individual could receive multiple different services or come in on different dates for the same service. I want to specifically focus on two distinct services and group individuals based on the number of times they received either service (i.e. Received once, twice,...,6+). Being that I would have to group probably on only id number and service type, I would like any recommendation on how to approach this. I made a small scale example of what my data set looks like. Thank you and I appreciate any help
ID Number Date Agency Service
0012 5/1/2020 Hospital A X Ray
0012 5/1/2020 Hospital A Lab Work
0012 5/5/2020 Clinic D Lab Work
0014 5/1/2020 Hospital A MRI
0014 5/1/2020 Hospital A Physical Therapy
0020 5/3/2020 Hospital B Lab Work
0021 5/1/2020 Hospital A X Ray
0021 5/1/2020 Hospital A Lab Work
0021 5/1/2020 Community Clinic Physical Therapy
0021 5/5/2020 Hospital A Lab Work
0021 5/5/2020 Hospital A Referral
0021 5/10/2020 Clinic D Lab Work
I am working with this large dataset that looks at a number of different patients receiving services dating back from five years ago. Now each patient is given a unique id number, but there are duplicates in the dataset because an individual could receive multiple different services or come in on different dates for the same service. I want to specifically focus on two distinct services and group individuals based on the number of times they received either service (i.e. Received once, twice,...,6+). Being that I would have to group probably on only id number and service type, I would like any recommendation on how to approach this. I made a small scale example of what my data set looks like. Thank you and I appreciate any help
ID Number Date Agency Service
0012 5/1/2020 Hospital A X Ray
0012 5/1/2020 Hospital A Lab Work
0012 5/5/2020 Clinic D Lab Work
0014 5/1/2020 Hospital A MRI
0014 5/1/2020 Hospital A Physical Therapy
0020 5/3/2020 Hospital B Lab Work
0021 5/1/2020 Hospital A X Ray
0021 5/1/2020 Hospital A Lab Work
0021 5/1/2020 Community Clinic Physical Therapy
0021 5/5/2020 Hospital A Lab Work
0021 5/5/2020 Hospital A Referral
0021 5/10/2020 Clinic D Lab Work
8 REPLIES 8
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq data=have;
tables id*service/list;
run;
--
Paige Miller
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@edwardar126 wrote:
Thank you for the suggestion. I actually did this, but my concern being that since my dataset is 100,000+ I get a large list of patients and I would still need to group each individuals. I'm thinking about creating another variable for this step.
Then, you have not yet described what you do want in enough detail to help.
--
Paige Miller
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Overall what I'm thinking about doing is creating a new variable that groups individuals by the number of times they received a certain service type. So in this case count the number of patients who received a service only once, twice, …, or 6+. The Proc Freq is definitely a good start to look at possible groupings, but I think what I will need to take a step further. Thanks again for your consideration, sorry for not being clear earlier I'm new to this sub 😕
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then you run PROC FREQ twice. The first time you use tables id*service/out=_a_ noprint; and save the results in a SAS data set. The second time you run PROC FREQ on data set _a_ with tables count;
--
Paige Miller
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It helps to show what you would expect as a result from your given example data.
I can see something like using Proc Freq to count the person/service combinations sending that to a data set and then summarizing the "count" by the service type. Not terribly difficult or complex but we do need to know what you want as a final result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey, I apologize for not further clarifying from the beginning. Ultimately, I am only interested in patients receiving two specific services. So in my example, I'm really interested in individuals who received either 'lab work' or 'x ray'. What I will do then is group individuals by the number of times they received either service, in this case how many individuals 'received a service at least once', 'at least twice', … ,or '6+', so in the example above individual 0012 would be grouped as having a frequency of 3 times (received an x ray once and received lab work twice). My first step that I've done was create an IF statement to just focus on the two specific service, but now I am also considering creating a new variable. I don't think a Proc Freq will capture what I want to accomplish being that my dataset is set more than 100,000+ observations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc SQL gives you lots of flexibility to do this kind of grouping:
proc sql;
create table want as
select
count(id) as count,
*
from have
group by ID, service;
quit;
Count ID Date Agency Service 2 12 2020-05-01 Hospital A Lab Work 2 12 2020-05-05 Clinic D Lab Work 1 12 2020-05-01 Hospital A X Ray 1 14 2020-05-01 Hospital A MRI 1 14 2020-05-01 Hospital A Physical Therapy 1 20 2020-05-03 Hospital B Lab Work 3 21 2020-05-10 Clinic D Lab Work 3 21 2020-05-05 Hospital A Lab Work 3 21 2020-05-01 Hospital A Lab Work 1 21 2020-05-01 Community Clinic Physical Therapy 1 21 2020-05-05 Hospital A Referral 1 21 2020-05-01 Hospital A X Ray
PG