BookmarkSubscribeRSS Feed
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Hello,

 

Using SAS 9.4, I have a data set of 7000 rows and 200 variables. I have an ID and each ID has multiple observations. The data set has the date the data last recorded. Across the 200 variables not all data was updated for each new date last recorded. By the ID, I would like to fill all variables in the last row (sorted to have the most recently updated date last) with missing data (the data could be character or numeric) with the non-missing data from the previous rows. In the code in the first 2 rows in each ID are an example of how the data could look and the 3rd row in the ID is how I would want the finished product to look. My question is if there is a reasonably efficient way to accomplish this while only filling values into cells with missing values? Thank you 

data want;
	input id sex $ race $ age date_last_completed: mmddyy10.;
	format date_last_completed mmddyy10.;
	datalines;
	1 Male 	 White 	50 	.
	1 . 	 White 	. 	03/02/2024
	1 Male	 White	50	03/02/2024
	2 Female .		.	03/03/2024
	2 . 	 Black 	50 	.
	2 Female Black	50	03/03/2024
	;
run;
3 REPLIES 3
Tom
Super User Tom
Super User

The easiest way is to use the UPDATE statement.  That needs two datasets, current values and transactions.  But you can use an empty version of your data as the current values and then treat all of the data as the transactions.

 

To avoid collapsing to a single observation per BY group just add an OUTPUT statement.

data have;
	input id sex $ race $ age date_last_completed :mmddyy10.;
	format date_last_completed mmddyy10.;
datalines;
1 Male 	 White 	50 	.
1 . 	 White 	. 	03/02/2024
1 Male	 White	50	03/02/2024
2 Female .		.	03/03/2024
2 . 	 Black 	50 	.
2 Female Black	50	03/03/2024
;

proc print;
run;

data want;
  update have(obs=0) have;
  by id;
  output;
run;

proc print;
run;

Tom_0-1711131414952.png

Without the explicit OUTPUT statement you would get :

Tom_0-1711131550639.png

 

 

ballardw
Super User

A caveat about a process like this where you have AGE and, I suspect in real data, different dates. Age is date dependent in common usage. So if you pull an Age from a previous record you might be off of the actual age for a different date. The longer the interval between the current record and the previous with age populated the more likely to have such occur.

 

It may be that you don't have much concern over age at this point but it should be considered.

 

 

data_null__
Jade | Level 19

@ballardw wrote:

A caveat about a process like this where you have AGE and, I suspect in real data, different dates. Age is date dependent in common usage. So if you pull an Age from a previous record you might be off of the actual age for a different date. The longer the interval between the current record and the previous with age populated the more likely to have such occur.

 

It may be that you don't have much concern over age at this point but it should be considered.

 

 


In the situation you describe you would exclude AGE from the UPDATE and use SET statement for variables that you don't want to carry forward.

 

update have(obs=0 keep=ID) have(drop=age);
by ID;
set have(keep=age);

use keep and drop combined with SAS-Variable-Lists as needed to make this more concise.  

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 661 views
  • 2 likes
  • 4 in conversation