BookmarkSubscribeRSS Feed
CathyVI
Lapis Lazuli | Level 10

Hi

Can someone pls explain what this questions mean. I can't create a dummy variable because I don't understand the question. Any simple sas code to help understand the questions will be appreciated.

 1. /*Generate a report on the proportion of: patients on any drug, patients with any physician visit, and patients on any drug or with any physician visit*/

2. /*Report the distribution of: number of physician visits per patient, number of patients per drugcode, number of patients per diagnostic grouping, and number of prescriptions per drugcode*/

7 REPLIES 7
Astounding
PROC Star

There is no such thing as SAS code that will help you understand the questions.  Here is a set of preliminary questions to see if you might be able to approach these problems.

 

Do you have one data set with patient information, or more than one?

 

For each data set, do you know the name of the data set and the names of the variables it contains?

 

For each data set, does it contain one observation per patient or multiple observations per patient?

 

If you had the data printed on paper in front of you, how could you tell if a patient was on any drug, or a patient had a physician visit?

 

My apologies if these questions seem too easy, but I just have to make sure we are not skipping any basic ideas that you will need.

CathyVI
Lapis Lazuli | Level 10

@Astounding Your questions are meaningful and I understand all these concept but I guess where I got confused is how to interpret the question into sascode. To answer your questions:
I have 3 dataset. Each data have a unique identifier. See data information below. So how could I interpret the questions into code with these information?
Data1: Patient Demographic Variables:
patient_id [numeric]
sex {‘M’,’F’,’U’}
race {0,1,2,3,4,5}
birthdt [character (8) mmddyyyy format]
Data 2: Prescription Variables:
patient_id [numeric]
fill_dt [character (6) yymmdd format]
pharmacyid [numeric]
drugcode [numeric]
pills [numeric]
Data 3: Medical Variables:
patientid [numeric]
servicedate [character (6) yymmdd format],
providerid [numeric]
source [character (3) format ‘aaa’]
servicecode [character (5) format ‘annnn’]
diagnosiscode1 [character (3) format ‘nnn’]
diagnosiscode2 [character (3) format ‘nnn’]
diagnosiscode3 [character (3) format ‘nnn’]
• There is one personal summary record per patient.
• There are multiple prescription records per patient.
• There are multiple medical records per patient.

Astounding
PROC Star

That's a good start.  Presumably your first data set contains one observation per patient, but the others may contain many observations per patient.

 

Without writing any SAS code (yet), you might consider whether a patient was taking any medications.  One way to do that would be to examine the second data set and see if any of the fill dates are filled in.  To consider whether a patient had any doctor visits, you might examine the third data set and check whether any of the service dates are filled in.  These steps could conceivably find that a patient took many drugs, or had many doctor visits.  So the results from these checks would still need to reduce the results down to a single observation per patient with (perhaps) a Y/N result for "any drugs" and another Y/N result for "any visits".

 

Based on what you know, see if the related question (your original question #1) at least becomes approachable.

CathyVI
Lapis Lazuli | Level 10

@Astounding Since I don't have a data for this analysis to visualize. If am to practice this, will i create a data step to check if any fill dates are filled in or service date are filled in ?

Will this code works? I am not sure how to use date format in an if statement.

/*2nd data*/ 

data want;

set Prescription;

if fill_dt ne missing then pt_anydrug=1; else pt_anydrug=0;  /*how will i code fill_dt*?/

run;

/*3rd data*/

data want2;

set Medical;

if  servicedate ne missing then pt_doctorvisits =1; else pt_doctorvisits=0;

run;

 

Astounding
PROC Star

No data to practice with?  Wow.

 

With no data, I can't tell whether your variables are numeric or character.  So let's choose a syntax that would work either way:

if missing(fill_dt) then pt_anydrug=0;
else pt_anydrug=1;

Similarly,

if missing(servicedate) then pt_doctorvisits=0;
else pt_doctorvisits=1;

One result that may show up later:  it is possible that you will get all 1 values and no 0 values.  For example, it is possible that the Medications data contains only drugs that were taken and does not contain any observations for patients that received no drugs.  Difficult to tell with no data!

CathyVI
Lapis Lazuli | Level 10

@Astounding 

Thank you. From my prior message  

fill_dt  is  [character (6) yymmdd format]

servicedate is [character (6) yymmdd format]

Please any idea on how to address the second question and confused with how to code the "per patient" ?

  • Report the distribution of: # of physician visits per patient, # of patients per drugcode, # of patients per diagnostic grouping, and # of prescriptions per drugcode.
Astounding
PROC Star

You're right, you had already posted info about the characteristics of the variables.

 

Here's the idea of what needs to happen for the second set of questions.  I will use the medication data as an example, since that is required for the most difficult question (number of patients per diagnostic code).  That data set contains many observations for some patients, perhaps 0 observations for some patients.  And with 3 diagnosis codes, you may need to assign a patient to more than one diagnosis.

 

First, you need to create up to 3 observations for each existing observation (one observation per diagnosis code).  Once you have done that, you need to reduce that data set, eliminating duplicate occurrences of patient + diagnosis code.  Then you need to summarize it, getting just one observation per diagnosis code, with number of patients calculated.  As a learning experience, I would suggest you try your hand at this, and begin by looking up documentation on the OUTPUT statement.  After a while, it it becomes too difficult we can review what you came up with so far.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1546 views
  • 1 like
  • 2 in conversation