Solved: Re: Creating new data set with variables from most recent visit date

jarnaez · Posted 09-26-2017 02:43 PM

For a project we received a data set containing biomarker measurements from a number of physician visits for each subject. Not all variables of interest are present at each visit and we were wanting to create a new data set containing the values from the most recent visit that measure was taken at. I'm not sure how or if it is possible to set this up.

To give a bit more detail about what I would like to do, for each subject I would like to find a way to check if a measure has a value from the most recent visit date (i.e. 25AUG2015). if there is one keep that value(BMI=30) in the new data and stop searching for that variable in that subject. If not (BMI=.) go back to the next most recent date (16AUG105) and check there and so on.

I'm not real sure where to start on this issue so any help would be appreciated. Please let me know if I can clarify more and thank you all in advance.

mkeintz · Posted 09-26-2017 07:53 PM

Here's a better way than my prior suggestion:

proc sort data=have;
  by sub date;
run;

data want;
  update have (obs=0) have (in=inright);
  by sub;
  if inright then output;
run;

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV. (Merge would take missing values from the right side as well).

Why the "obs=0" on the left hand dataset? That's because it's just being used as a template, but not tp contribute actual data values.

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below). But you want to ouput one record per incoming record. Hence the "if inright then output" statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Kurt_Bremser · Posted 09-26-2017 02:51 PM

Sort by patient and date.

In a data step:

Retain new variables for each value. Everytime a valid value is encountered, set the respective retained variable.

At last.patient, output.

Drop the original variables.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 09-26-2017 03:21 PM

The time-saving idea would be to change your data once. Instead of searching through previous observations each time you use the data, add the most recent value as a new variable and save it that way. For example:

proc sort data=have;

by subject date;

run;

data want;

set have;

by subject;

if first.subject then most_recent_bmi = bmi;

retain most_recent_bmi;

if bmi > . then most_recent_bmi = bmi;

run;

Now you can refer to either variable: BMI (actual measurement at that visit) or MOST_RECENT_BMI (most recent measurement, including the value on the current visit).

mkeintz · Posted 09-26-2017 06:08 PM

This program will do what you want for all numeric measures.

proc sort data=have;
  by sub date;
run;

data want (drop=_:);
  set have (keep=sub date);
  by sub;

  retain _sentinel1;
  set have;
  retain _sentinel2;

  array orig {*}   _sentinel1-numeric-_sentinel2;  /*all numerics except SUB and DATE*/
  array tmpn {100} _temporary_;  /* Assumes <=100 numeric vars */

  if first.sub then call missing(of tmpn{*});

  do I=2 to dim(orig)-1;
    if missing(orig{I}) then orig{I}=tmpn{I};
    else tmpn{I}=orig{i};
  end;

run;

If there are other vars that you do NOT want to do LOCF (last observation carried forward), then include them in the "Keep=" parameter in the first SET statement. That way they will not fall between SENTINEL1 and SENTINEL2 in the program data vector and therefore not be in the ORIG array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 09-26-2017 07:53 PM

Here's a better way than my prior suggestion:

proc sort data=have;
  by sub date;
run;

data want;
  update have (obs=0) have (in=inright);
  by sub;
  if inright then output;
run;

Remember, unlike MERGE, the UPDATE statement will apply only non-missing values from the right-hand dataset to the PDV. (Merge would take missing values from the right side as well).

Why the "obs=0" on the left hand dataset? That's because it's just being used as a template, but not tp contribute actual data values.

Usually the UPDATE statement is meant to apply a series of "transactions" to a master file, and therefore outputs only one record per BY level (one per sub below). But you want to ouput one record per incoming record. Hence the "if inright then output" statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Creating new data set with variables from most recent visit date

Re: Creating new data set with variables from most recent visit date

Re: Creating new data set with variables from most recent visit date

Re: Creating new data set with variables from most recent visit date

Re: Creating new data set with variables from most recent visit date

Re: Creating new data set with variables from most recent visit date

Registration is open

SAS Training: Just a Click Away