BookmarkSubscribeRSS Feed
cmartin1
Calcite | Level 5

Hi everyone, this is a homework question I have been working on that I just can't get. We just went over working with duplicates and first. last., etc, so I'm thinking the solution would be found using this material. Here is the question (I wrote the data in using datalines):

 

  1. Given a dataset containing patient id and visit date, write the program to select the earliest 50% of each patient’s records.  For example, if a patient has 4 records, two in 2005 and two in 2006, then the two in 2005 should be selected and output to the new dataset. A partial list of records in the dataset is given below:

PTID          VisDate                                                         

1                11/05/2005              

1                11/05/2005              

1                02/15/2005              

1                01/14/2004                                                  

5                05/05/2005                                                                    

5                08/15/2007              

2                05/15/2006              

2                07/14/2006              

2                01/05/2007              

2                02/25/2007              

3                05/05/2006

10 REPLIES 10
Reeza
Super User
What are the logical steps to solve the problem - not in code?

1. Find the number of visits for each patient and add that number to the data set.
2. Add a counter to each row for the number of the observations
3. If the counter is less than half the total for that patient you keep that record.

Can you now code that solution?
cmartin1
Calcite | Level 5
So this is what I have so far. I added the counter which outputs the number of visits for each patient, but I was unsure on how to add the total and then use that to output the correct records. I tried using "if last." to get the total for each patient but I was then unsure of how to go from there.

data qone;
informat ptidno 3. apptdate date9.;
input ptidno apptdate;
datalines;
1 05NOV2005
1 05NOV2005
1 15FEB2005
1 14JAN2004
5 05MAY2005
5 15AUG2007
2 15MAY2006
2 14JUL2006
2 05JAN2007
2 25FEB2007
3 05MAY2006

;
proc sort data = qone out = qonetoo; by ptidno apptdate; run;

data earliestfifty; set qonetoo;
by ptidno apptdate;
if first.ptidno then nvisits = 0;
nvisits + 1;
run;

proc print data = earliestfifty;
format apptdate date9.;
run;
Reeza
Super User
Have you learned merges yet?
cmartin1
Calcite | Level 5
Yes!
Reeza
Super User
Then maybe merge in the totals? That should get you there then.
cmartin1
Calcite | Level 5

cmartin1_0-1604957920099.png

 

This is the output that I get with this code: 

data earliestfifty; set qonetoo;
by ptidno apptdate;
if first.ptidno then nvisits = 0;
nvisits + 1;
if last.ptidno then totalvisits = nvisits;

run; 

 

I tried merging this data set with the originally sorted data set like this:

 

data earliest;
merge earliestfifty qonetoo;
by ptidno;

run;

 

and get the same output. Is there something else I can do in the merge or the first data step to assign that total to each observation?

 

 

Reeza
Super User
Seems like you need to split the data sets and then merge them somehow?
cmartin1
Calcite | Level 5
Anything that you would suggest?
Reeza
Super User
FYI - if you use PROC RANK you can do this in one procedure.
Ksharp
Super User
data qone;
informat ptidno 3. apptdate date9.;
input ptidno apptdate;
format apptdate date9.;
datalines;
1 05NOV2005
1 05NOV2005
1 15FEB2005
1 14JAN2004
5 05MAY2005
5 15AUG2007
2 15MAY2006
2 14JUL2006
2 05JAN2007
2 25FEB2007
3 05MAY2006

;
proc sort data = qone out = qonetoo; by ptidno apptdate; run;

data earliestfifty; set qonetoo;
by ptidno;
if first.ptidno then nvisits = 0;
nvisits + 1;
run;
proc sql;
create table want as
select *,count(*) as total
 from earliestfifty
  group by ptidno 
   having nvisits<=total/2
     order by ptidno,nvisits;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1760 views
  • 2 likes
  • 3 in conversation