Dear All: My data is as follows: ID VAR_A 1 MF 1 MF 1 MF 2 HUF 2 2 HUF 2 3 3 3 4 4 I 4 I 4 I 4 NRI 4 I want my data to look as follow -- where the rule is that if the last observation is missing the missing value should
be the previous non-missing observation. ID VAR_A 1 MF 1 MF 1 MF 2 HUF 2 2 HUF 2 HUF 3 3 3 4 4 I 4 I 4 I 4 NRI 4 NRI
I think if you retain the most recent non-missing Var_A for each ID, you can accomplish what you're asking for fairly easily. See sample code, below. However, note that if all of the Var_A's are missing (as in ID 3), then the last occurrence of the ID will still be missing; there's not much we can do there.
Jim
DATA Unaltered_Data;
INFILE Datalines MISSOVER;
INPUT ID $
VAR_A $
;
DATALINES;
1 MF
1 MF
1 MF
2 HUF
2
2 HUF
2
3
3
3
4
4 I
4 I
4 I
4 NRI
4
;
RUN;
DATA Non_Missing_Last;
DROP _:;
RETAIN _Prior_Not_Missing ' ';
SET Unaltered_Data;
BY ID NOTSORTED;
IF FIRST.ID THEN
CALL MISSING(_Prior_Not_Missing);
IF LAST.ID AND
MISSING(VAR_A) THEN
VAR_A = _Prior_Not_Missing;
IF NOT MISSING(VAR_A) THEN
_Prior_Not_Missing = VAR_A;
RUN;
Merge fun:
DATA have;
INFILE Datalines MISSOVER;
INPUT ID $ VAR_A $;
DATALINES;
1 MF
1 MF
1 MF
2 HUF
2
2 HUF
2
3
3
3
4
4 I
4 I
4 I
4 NRI
4
;
RUN;
data want;
merge have have(rename=(var_a=_var_a) where=(_var_a>' '));
by id;
if last.id and var_a=' ' then var_a=_var_a;
drop _:;
run;
You would use the LAG(VAR_A) under the conditions of missing VAR_A and last.id in a group of size > 1.
The logic evaluating such a condition can be encapsulated in a single functional expression.
Example:
data want; set have; by id; var_a = coalesceC (var_a, ifc (last.id and not first.id, lag(var_a), '')); run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.