Dear All: My data is as follows: ID VAR_A 1 MF 1 MF 1 MF 2 HUF 2 2 HUF 2 3 3 3 4 4 I 4 I 4 I 4 NRI 4 I want my data to look as follow -- where the rule is that if the last observation is missing the missing value should
be the previous non-missing observation. ID VAR_A 1 MF 1 MF 1 MF 2 HUF 2 2 HUF 2 HUF 3 3 3 4 4 I 4 I 4 I 4 NRI 4 NRI
I think if you retain the most recent non-missing Var_A for each ID, you can accomplish what you're asking for fairly easily. See sample code, below. However, note that if all of the Var_A's are missing (as in ID 3), then the last occurrence of the ID will still be missing; there's not much we can do there.
Jim
DATA Unaltered_Data;
INFILE Datalines MISSOVER;
INPUT ID $
VAR_A $
;
DATALINES;
1 MF
1 MF
1 MF
2 HUF
2
2 HUF
2
3
3
3
4
4 I
4 I
4 I
4 NRI
4
;
RUN;
DATA Non_Missing_Last;
DROP _:;
RETAIN _Prior_Not_Missing ' ';
SET Unaltered_Data;
BY ID NOTSORTED;
IF FIRST.ID THEN
CALL MISSING(_Prior_Not_Missing);
IF LAST.ID AND
MISSING(VAR_A) THEN
VAR_A = _Prior_Not_Missing;
IF NOT MISSING(VAR_A) THEN
_Prior_Not_Missing = VAR_A;
RUN;
Merge fun:
DATA have;
INFILE Datalines MISSOVER;
INPUT ID $ VAR_A $;
DATALINES;
1 MF
1 MF
1 MF
2 HUF
2
2 HUF
2
3
3
3
4
4 I
4 I
4 I
4 NRI
4
;
RUN;
data want;
merge have have(rename=(var_a=_var_a) where=(_var_a>' '));
by id;
if last.id and var_a=' ' then var_a=_var_a;
drop _:;
run;
You would use the LAG(VAR_A) under the conditions of missing VAR_A and last.id in a group of size > 1.
The logic evaluating such a condition can be encapsulated in a single functional expression.
Example:
data want; set have; by id; var_a = coalesceC (var_a, ifc (last.id and not first.id, lag(var_a), '')); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.