Fill list of variables with missing data with nonmissing data from row...

GS2 · Posted 03-22-2024 02:09 PM

Hello,

Using SAS 9.4, I have a data set of 7000 rows and 200 variables. I have an ID and each ID has multiple observations. The data set has the date the data last recorded. Across the 200 variables not all data was updated for each new date last recorded. By the ID, I would like to fill all variables in the last row (sorted to have the most recently updated date last) with missing data (the data could be character or numeric) with the non-missing data from the previous rows. In the code in the first 2 rows in each ID are an example of how the data could look and the 3rd row in the ID is how I would want the finished product to look. My question is if there is a reasonably efficient way to accomplish this while only filling values into cells with missing values? Thank you

data want;
	input id sex $ race $ age date_last_completed: mmddyy10.;
	format date_last_completed mmddyy10.;
	datalines;
	1 Male 	 White 	50 	.
	1 . 	 White 	. 	03/02/2024
	1 Male	 White	50	03/02/2024
	2 Female .		.	03/03/2024
	2 . 	 Black 	50 	.
	2 Female Black	50	03/03/2024
	;
run;

Tom · Posted 03-22-2024 02:17 PM

The easiest way is to use the UPDATE statement. That needs two datasets, current values and transactions. But you can use an empty version of your data as the current values and then treat all of the data as the transactions.

To avoid collapsing to a single observation per BY group just add an OUTPUT statement.

data have;
	input id sex $ race $ age date_last_completed :mmddyy10.;
	format date_last_completed mmddyy10.;
datalines;
1 Male 	 White 	50 	.
1 . 	 White 	. 	03/02/2024
1 Male	 White	50	03/02/2024
2 Female .		.	03/03/2024
2 . 	 Black 	50 	.
2 Female Black	50	03/03/2024
;

proc print;
run;

data want;
  update have(obs=0) have;
  by id;
  output;
run;

proc print;
run;

Without the explicit OUTPUT statement you would get :

ballardw · Posted 03-22-2024 03:18 PM

A caveat about a process like this where you have AGE and, I suspect in real data, different dates. Age is date dependent in common usage. So if you pull an Age from a previous record you might be off of the actual age for a different date. The longer the interval between the current record and the previous with age populated the more likely to have such occur.

It may be that you don't have much concern over age at this point but it should be considered.

data_null__ · Posted 03-22-2024 03:39 PM

@ballardw wrote:

A caveat about a process like this where you have AGE and, I suspect in real data, different dates. Age is date dependent in common usage. So if you pull an Age from a previous record you might be off of the actual age for a different date. The longer the interval between the current record and the previous with age populated the more likely to have such occur.

It may be that you don't have much concern over age at this point but it should be considered.

In the situation you describe you would exclude AGE from the UPDATE and use SET statement for variables that you don't want to carry forward.

update have(obs=0 keep=ID) have(drop=age);
by ID;
set have(keep=age);

use keep and drop combined with SAS-Variable-Lists as needed to make this more concise.

Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Re: Fill list of variables with missing data with nonmissing data from rows above by ID

Registration is open

SAS Training: Just a Click Away