Help using Base SAS procedures

Creating new data set with variables from most recent visit date

Accepted Solution Solved
Reply
New User
Posts: 1
Accepted Solution

Creating new data set with variables from most recent visit date

For a project we received a data set containing biomarker measurements from a number of physician visits for each subject. Not all variables of interest are present at each visit and we were wanting to create a new data set containing the values from the most recent visit that measure was taken at. I'm not sure how or if it is possible to set this up.

 

To give a bit more detail about what I would like to do, for each subject I would like to find a way to check if a measure has a value from the most recent visit date (i.e. 25AUG2015). if there is one keep that value(BMI=30) in the new data and stop searching for that variable in that subject. If not (BMI=.) go back to the next most recent date (16AUG105) and check there and so on.

 

I'm not real sure where to start on this issue so any help would be appreciated. Please let me know if I can clarify more and thank you all in advance.


Accepted Solutions
Solution
‎09-27-2017 12:33 PM
Trusted Advisor
Posts: 1,343

Re: Creating new data set with variables from most recent visit date

[ Edited ]

Here's a better way than my prior suggestion:

 

proc sort data=have;
by sub date;
run;

data want; update have (obs=0) have (in=inright); by sub; if inright then output; run;

 

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV.  (Merge would take missing values from the right side as well).

 

Why the "obs=0" on the left hand dataset?  That's because it's just being used as a template, but not tp contribute actual data values.

 

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below).  But you want to ouput one record per incoming record.  Hence the "if inright then output" statement.

View solution in original post


All Replies
Super User
Posts: 10,259

Re: Creating new data set with variables from most recent visit date

Sort by patient and date.

In a data step:

Retain new variables for each value. Everytime a valid value is encountered, set the respective retained variable.

At last.patient, output.

Drop the original variables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,780

Re: Creating new data set with variables from most recent visit date

The time-saving idea would be to change your data once.  Instead of searching through previous observations each time you use the data, add the most recent value as a new variable and save it that way.  For example:

 

proc sort data=have;

by subject date;

run;

 

data want;

set have;

by subject;

if first.subject then most_recent_bmi = bmi;

retain most_recent_bmi;

if bmi > . then most_recent_bmi = bmi;

run;

 

Now you can refer to either variable:  BMI (actual measurement at that visit) or MOST_RECENT_BMI (most recent measurement, including the value on the current visit).

Trusted Advisor
Posts: 1,343

Re: Creating new data set with variables from most recent visit date

This program will do what you want for all numeric measures.

 

proc sort data=have;
  by sub date;
run;

data want (drop=_:);
  set have (keep=sub date);
  by sub;

  retain _sentinel1;
  set have;
  retain _sentinel2;

  array orig {*}   _sentinel1-numeric-_sentinel2;  /*all numerics except SUB and DATE*/
  array tmpn {100} _temporary_;  /* Assumes <=100 numeric vars */

  if first.sub then call missing(of tmpn{*});

  do I=2 to dim(orig)-1;
    if missing(orig{I}) then orig{I}=tmpn{I};
    else tmpn{I}=orig{i};
  end;

run;

 

If there are other vars that you do NOT want to do LOCF (last observation carried forward), then include them in the "Keep=" parameter in the first SET statement.  That way they will not fall between SENTINEL1 and SENTINEL2 in the program data vector and therefore not be in the ORIG array.

Solution
‎09-27-2017 12:33 PM
Trusted Advisor
Posts: 1,343

Re: Creating new data set with variables from most recent visit date

[ Edited ]

Here's a better way than my prior suggestion:

 

proc sort data=have;
by sub date;
run;

data want; update have (obs=0) have (in=inright); by sub; if inright then output; run;

 

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV.  (Merge would take missing values from the right side as well).

 

Why the "obs=0" on the left hand dataset?  That's because it's just being used as a template, but not tp contribute actual data values.

 

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below).  But you want to ouput one record per incoming record.  Hence the "if inright then output" statement.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 360 views
  • 2 likes
  • 4 in conversation