BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonate_H
Quartz | Level 8

data have;

   input id var1 var2;

   cards;

   1 3 .

   1 5 .

   1 . 4

   1 . 7

   2 4 2

   2 8 .

   2 . 8

   2 9 .

   2 . .

   3 2 3

   3 7 .

   3 . 9

   3 . 6

   4 . 2

   5 . 1

   5 9 2

   6 . 5

   6 . 3

run;

  1. If the missing value is not at the beginning of each ID, then the missing values equal to lag non-missing value;
  2. If the missing value is at the beginning of each ID, and no non-missing values after the missing value, then the missing values equal to 0. E.g., for ID=4 and ID=6;
  3. If the missing value is at the beginning of each ID, and there exist non-missing values after the missing value, then the missing values equals to forward non-missing value; e.g., for ID=5, the missing value for var1 should be replaced by 9, rather than 7 (which is for ID=3) or 0 (which is for ID=4).

 

I was thinking to generate two new variables, e.g., variable “count” to count by ID, and variable “total” represents total observations for each ID, use those two new variables as conditions for different missing value replacement. However, look-ahead replacement is out of my reach.

 

I also find some answers online, but not exactly what I want, like the following one is replacing all missing values with lag values.

https://communities.sas.com/t5/SAS-Procedures/Replacing-missing-values-by-previous-observation/td-p/...

 

data want;

set have;

n=_n_;

if missing(var1) then do;

do until (not missing(var1));

n=n-1;

set have(keep= var1) point=n; *second SET statement;

end;

end;

run;

 

Desired output:

pic.png

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Double DO UNTIL() to the rescue:

 

data have;
input id var1 var2;
cards;
1 3 .
1 5 .
1 . 4
1 . 7
2 4 2
2 8 .
2 . 8
2 9 .
2 . .
3 2 3
3 7 .
3 . 9
3 . 6
4 . 2
5 . 1
5 9 2
6 . 5
6 . 3
;

data want;
array firstExist{2};
array lastExist{2};
array var{2};
do until(last.id);
    set have; by id;
    do i = 1 to dim(var);
        if missing(firstExist{i}) then firstExist{i} = var{i};
        end;
    end;
do until(last.id);
    set have; by id;
    do i = 1 to dim(var);
        if missing(var{i}) then var{i} = coalesce(lastExist{i}, firstExist{i}, 0);
        else lastExist{i} = var{i};
        end;
    output;
    end;
keep id var:;
run;

proc print data=want noobs; var id var1 var2; run;
PG

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

Double DO UNTIL() to the rescue:

 

data have;
input id var1 var2;
cards;
1 3 .
1 5 .
1 . 4
1 . 7
2 4 2
2 8 .
2 . 8
2 9 .
2 . .
3 2 3
3 7 .
3 . 9
3 . 6
4 . 2
5 . 1
5 9 2
6 . 5
6 . 3
;

data want;
array firstExist{2};
array lastExist{2};
array var{2};
do until(last.id);
    set have; by id;
    do i = 1 to dim(var);
        if missing(firstExist{i}) then firstExist{i} = var{i};
        end;
    end;
do until(last.id);
    set have; by id;
    do i = 1 to dim(var);
        if missing(var{i}) then var{i} = coalesce(lastExist{i}, firstExist{i}, 0);
        else lastExist{i} = var{i};
        end;
    output;
    end;
keep id var:;
run;

proc print data=want noobs; var id var1 var2; run;
PG
Ksharp
Super User

data have;
   input id var1 var2;
   n+1;
   cards;
   1 3 .
   1 5 .
   1 . 4
   1 . 7
   2 4 2
   2 8 .
   2 . 8
   2 9 .
   2 . .
   3 2 3
   3 7 .
   3 . 9
   3 . 6
   4 . 2
   5 . 1
   5 9 2
   6 . 5
   6 . 3
;
run;
data temp;
 set have;
 by id;
 retain v1 v2;
 if first.id then do;v1=.;v2=.;end;
 if not missing(var1) then v1=var1;
 if not missing(var2) then v2=var2;
 drop var1 var2;
run;

proc sort data=temp;by descending n;run;
data temp;
 set temp;
 by id notsorted;
 retain var1 var2;
 if first.id then do;var1=.;var2=.;end;
 if not missing(v1) then var1=v1;
 if not missing(v2) then var2=v2;
 drop v1 v2;
run;
proc sort data=temp;by n;run;
proc stdize data=temp out=want(drop=n) reponly missing=0;
 var var1 var2;
run;


PGStats
Opal | Level 21

Hi @Ksharp, I think the better way to read a dataset backwards is with the point= option, as in:

 

data ssalc / view=ssalc;
do i = nobs to 1 by -1;
    set sashelp.class nobs=nobs point=i;
    output;
    end;
stop;
run;

proc print data=ssalc; run;

It doesn't require the creation of an order variable (your n) and should be faster than a sort.

PG
Ksharp
Super User
PG, I don't think so. Especially for big table, I wouldn't expect POINT= is a good choice. POINT= has its advantage only when you POINT small obs . When POINT millions obs ,that would be very slowly .
Jonate_H
Quartz | Level 8

Thanks a lot PG and Ksharp! Both work great! I would like to choose both as solutions; but I can only choose one, so I choose the first response.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 6617 views
  • 2 likes
  • 3 in conversation