filling missing values until new id

dustychair · Posted 04-10-2022 09:55 AM

Hi all,

I have a question: I have a data set as seen below. I used retain statement to create a data set as below in the second data set. However, it did not work for my case. I will appreciate for any suggestion.

Thank you,

	have
number	item1	item2	item3	item4	id
1	1				xx1
2		0
3			1
4				1
5	0				xx2
6		0
7			1
8				0
9

	want
number	item1	item2	item3	item4	id
1	1	0	1	1	xx1
2	0	0	1	0	xx2

FreelanceReinh · Posted 04-10-2022 11:31 AM

Hi @dustychair,

@dustychair wrote:

I used retain statement to create a data set as below in the second data set. However, it did not work for my case.

The issue with the RETAIN statement is that you would need to apply it to new variables, not to those read by the SET statement. Variable _id in the code below is an example.

data have;
input item1-item4 id $;
cards;
1 . . . xx1
. 0 . . .
. . 1 . .
. . . 1 .
0 . . . xx2
. 0 . . .
. . 1 . .
. . . 0 .
. . . . .
;

data _tmp(drop=id rename=(_id=id)) / view=_tmp;
set have;
if id ne ' ' then _id=id;
retain _id;
run;

data want;
update _tmp(obs=0) _tmp;
by id;
run;

dustychair · Posted 04-10-2022 12:13 PM

Thank you for your answer. It says "by variables are not properly sorted on data set WORK._TMP"

Thanks

FreelanceReinh · Posted 04-10-2022 12:44 PM

Then apply the technique shown for variable ID to the other variables as well:

data have;
input item1-item4 id $;
cards;
1 . . . xx2
. 0 . . .
. . 1 . .
. . . 1 .
0 . . . xx1
. 0 . . .
. . 1 . .
. . . 0 .
. . . . .
;

data _tmp(drop=_:) / view=_tmp;
set have(rename=(item1-item4=_i1-_i4 id=_id));
array item[4];
array _i[4];
if _id ne ' ' then id=_id;
do _j=1 to dim(_i);
  if _i[_j] ne . then item[_j]=_i[_j];
end;
retain item id;
run;

data want;
set _tmp;
by id notsorted;
if last.id;
run;

Depending on the structure of your real data, you can create dataset WANT in a single data step, but I don't want to make assumptions about your real data.

Tom · Posted 04-10-2022 09:01 PM

Then make a variable for grouping that is sorted.

data have ;
  infile cards truncover ;
  input item1-item4 id $;
cards;
1 . . . xx1
. 0 . .
. . 1 .
. . . 1
0 . . . xx2
. 0 . .
. . 1 .
. . . 0
. . . .
;

data group ;
  set have;
  group+ not missing(id);
run;

data want;
  update group(obs=0) group;
  by group;
run;

proc print;
run;

Results

Obs    item1    item2    item3    item4    id     group

 1       1        0        1        1      xx1      1
 2       0        0        1        0      xx2      2

FreelanceReinh · Posted 04-11-2022 03:43 AM

I didn't suggest a similar solution because of the implied assumptions about the real data: It would fail if some of the IDs were repeated rather than missing (but could be modified to avoid this).

mkeintz · Posted 04-10-2022 11:51 AM

For each missing ID, you want to carry forward the previous non-missing id. And you want to do the same for the ITEM variables. I.e. you want the last non-missing value for each of these variables within an ID group.

But for the NUMBER variable, you want the first non-missing value for each ID group.

Editted note: Did I misread the OP posting? I thought the second row of the desired output had NUMBER=5, (not NUMBER=2 as I see now). If so, then no special treatment is needed for NUMBER, like the SET ... POINT= statement below.

If the ID column in dataset HAVE had no holes, and if HAVE were sorted by ID, then you could use a the UPDATE statement, with a SET .... POINT= statement:

data want (drop=_:);
  update have(obs=0) have (rename=(number=_number));
  by id;
  if first.id then number=_number;
run;

You don't have that situation, but you could create a dataset NEED with all the missing ID values populated from the most recent non-missing ID value. Then you could apply the UPDATE technique to NEED:

data vneed (drop=_:) / view=vneed;
  set have (drop=number rename=(id=_test_id));
  if not missing(_test_id) then set have (keep=id number) point=_n_;
run;

data want;
  update vneed (obs=0) vneed;
  by id;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

dustychair · Posted 04-10-2022 12:19 PM

Thank you for your answer. I am getting the same error. it says "by variables are not properly sorted on data set "

Thanks

mkeintz · Posted 04-10-2022 08:03 PM

@dustychair wrote:
Thank you for your answer. I am getting the same error. it says "by variables are not properly sorted on data set "

Thanks

Then sort the dataset NEED:

data vneed (drop=_:) / view=vneed;
  set have (rename=(id=_test_id));
  if not missing(_test_id) then set have (keep=id) point=_n_;
run;
proc sort data=vneed out=need_sorted;
  by id;
run;
data want;
  update need_sorted (obs=0) need_sorted;
  by id;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

Re: filling missing values until new id

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away