BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jarnaez
Calcite | Level 5

For a project we received a data set containing biomarker measurements from a number of physician visits for each subject. Not all variables of interest are present at each visit and we were wanting to create a new data set containing the values from the most recent visit that measure was taken at. I'm not sure how or if it is possible to set this up.

 

To give a bit more detail about what I would like to do, for each subject I would like to find a way to check if a measure has a value from the most recent visit date (i.e. 25AUG2015). if there is one keep that value(BMI=30) in the new data and stop searching for that variable in that subject. If not (BMI=.) go back to the next most recent date (16AUG105) and check there and so on.

 

I'm not real sure where to start on this issue so any help would be appreciated. Please let me know if I can clarify more and thank you all in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Here's a better way than my prior suggestion:

 

proc sort data=have;
by sub date;
run;

data want; update have (obs=0) have (in=inright); by sub; if inright then output; run;

 

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV.  (Merge would take missing values from the right side as well).

 

Why the "obs=0" on the left hand dataset?  That's because it's just being used as a template, but not tp contribute actual data values.

 

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below).  But you want to ouput one record per incoming record.  Hence the "if inright then output" statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Sort by patient and date.

In a data step:

Retain new variables for each value. Everytime a valid value is encountered, set the respective retained variable.

At last.patient, output.

Drop the original variables.

Astounding
PROC Star

The time-saving idea would be to change your data once.  Instead of searching through previous observations each time you use the data, add the most recent value as a new variable and save it that way.  For example:

 

proc sort data=have;

by subject date;

run;

 

data want;

set have;

by subject;

if first.subject then most_recent_bmi = bmi;

retain most_recent_bmi;

if bmi > . then most_recent_bmi = bmi;

run;

 

Now you can refer to either variable:  BMI (actual measurement at that visit) or MOST_RECENT_BMI (most recent measurement, including the value on the current visit).

mkeintz
PROC Star

This program will do what you want for all numeric measures.

 

proc sort data=have;
  by sub date;
run;

data want (drop=_:);
  set have (keep=sub date);
  by sub;

  retain _sentinel1;
  set have;
  retain _sentinel2;

  array orig {*}   _sentinel1-numeric-_sentinel2;  /*all numerics except SUB and DATE*/
  array tmpn {100} _temporary_;  /* Assumes <=100 numeric vars */

  if first.sub then call missing(of tmpn{*});

  do I=2 to dim(orig)-1;
    if missing(orig{I}) then orig{I}=tmpn{I};
    else tmpn{I}=orig{i};
  end;

run;

 

If there are other vars that you do NOT want to do LOCF (last observation carried forward), then include them in the "Keep=" parameter in the first SET statement.  That way they will not fall between SENTINEL1 and SENTINEL2 in the program data vector and therefore not be in the ORIG array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Here's a better way than my prior suggestion:

 

proc sort data=have;
by sub date;
run;

data want; update have (obs=0) have (in=inright); by sub; if inright then output; run;

 

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV.  (Merge would take missing values from the right side as well).

 

Why the "obs=0" on the left hand dataset?  That's because it's just being used as a template, but not tp contribute actual data values.

 

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below).  But you want to ouput one record per incoming record.  Hence the "if inright then output" statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2900 views
  • 2 likes
  • 4 in conversation