DATA Step, Macro, Functions and more

Filling missing values.

Reply
N/A
Posts: 0

Filling missing values.

I have:

ID VALUE
12 .
12 .
12 4
17 .
17 1
37 .
37 .
37 .
37 2

I want:

ID VALUE
12 4
12 4
12 4
17 1
17 1
37 2
37 2
37 2
37 2

The value in the last record should be in all records for each ID. This must be easy.

Thanks for any help.

Susan
N/A
Posts: 0

Re: Filling missing values.

Posted in reply to deleted_user
Hi,

please try this..

proc sql;
create table ab(drop=value1) as
select id ,value as value1,max(value) as value
from ab
group by id ;
quit;

Rgds,
skm
Trusted Advisor
Posts: 2,116

Re: Filling missing values.

Posted in reply to deleted_user
Susan,

skm's elegant solution assumes that the ONLY non-missing data for VALUE is the last one for each ID (which is what your example showed, but not quite what the question stated). If there are multiple non-missing data for VALUE within ID, then the problem is considerably messier and likely requires two DATA steps and a SORT.

Doc Muhlbaier
Duke
Super Contributor
Super Contributor
Posts: 3,174

Re: Filling missing values.

Posted in reply to deleted_user
Assign the _N_ variable to your_variable OBSNUM in a DATA step to get the OBS NUMBER, then you can sort the data with a BY list and include DESCENDING OBSNUM and use BY with FIRST. and/or LAST. to propagate values onto other associated observations -- you will need to use a RETAIN variable or work within a DO loop with UNTIL(EOF); to avoid needing the RETAIN variable.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 474

Re: Filling missing values.

Posted in reply to deleted_user
Being IN the input dataset (and assuming it was previously sorted by ID), you could produce OUT the output dataset with the following code:

data OUT;
set IN;
by ID;
drop _C;
if first.ID then _C=0; /* reset counter */
_C+1; /* count */
if last.ID;
do _N_=1 to _C;
output; /* multiple output with last value of the group */
end;
run;

Cheers from Portugal.

Daniel Santos @ www.cgd.t
N/A
Posts: 0

Re: Filling missing values.

Posted in reply to DanielSantos
Thank you Daniel,

Your code seems to work, but I would like to have a "then" placed like this:

if last.ID then
do _N_=1 to _C;
output; /* multiple output with last value of the group */
end;

A single text row "if last.ID;" means to me that only the last record for each ID is kept.
Is it because "if last.ID;" is directly followed by a "do", that your code works?

Susan
N/A
Posts: 0

Re: Filling missing values.

Posted in reply to deleted_user
Daniel,

My table have more variables than ID and VALUE. I want to keep the different values for the those variables in each post, and your solution doesn't manage that.

Thanks anyway.

I hope you answer my previous question about the missing(?) "then".

Susan
Super Contributor
Posts: 474

Re: Filling missing values.

Posted in reply to deleted_user
Hello Susan.

the use of, if last.ID;

is know as a subsetting if.

only rows satisfying the if condition will pass beside the if (an reach the run; and so implicit output).

since after that, there is a do loop, with explicit output, this loop will be executed only at the last ID row.

Be careful that the syntax you have posted will not work, you should change it to:

if last.ID then do;
do _N_=1 to _C;
output;
end;

and this will do exactly the same as my code, because the do loop will only be executed at the last ID row, also.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Super Contributor
Super Contributor
Posts: 3,174

Re: Filling missing values.

Posted in reply to DanielSantos
Suggest the _N_ variable name not be used since it is used by SAS for counting the observation number (and is considered a reserved name variable).

Also, the SAS coding construct is totally valid and permissable for the purpose listed (as is the SAS coding technique from Daniel Santos), the objective would be to output iterative observations on the last occurrence of the BY variable identified in IF LAST.ID while maintaining the last VALUE , such as:

if last.a then do i=1 to _c;
drop i;
output;
end;


Scott Barry
SBBWorks, Inc.
Super Contributor
Super Contributor
Posts: 3,174

Re: Filling missing values.

Sorry - incorrect sample code posted:

if last.ID then do i=1 to _c;
drop i _c;
/* generate same obs count as input but with last VALUE occurrence */
output;
end;



Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 474

Re: Filling missing values.

I'm sorry Susan.

The syntax you posted is indeed correct, and will work as expected.

Scott, the use of the variable _N_, is a common optimization technique which I believe firstly appeared by the hand of the great Paul Dorfman.

_N_ is not used by SAS for counting rows, the row number is just passed to this automatic variable at the begin of each datastep loop. It's just meant to be used inside each loop, of course, if needed, which is not the case. So, if your not interested about its value, there's nothing wrong about changing it's value, and you get a perfectly good temporary variable, which no extra memory and that does not need to be dropped at the end of the datastep. Same goes for other automatic variable, like _ERROR_ and _IORC_.

But this would not be the subject here, so returning to Susan's question, Scott's example is right, and will do what is expected.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
N/A
Posts: 0

Re: Filling missing values.

Posted in reply to DanielSantos
I did like this:

data temporary(keep=id value);
set old;
by id;
if last.id;
run;

data new;
merge old temporary;
by id;
run;

As I said above there are other variables in my table that I want to keep in every record.

Susan Message was edited by: SusanZusan
Super Contributor
Super Contributor
Posts: 3,174

Re: Filling missing values.

Posted in reply to deleted_user
You will want to share a "complete" INPUT and OUTPUT (desired) data view for a an accurate reply, presuming you still have unsatisfactory results generated by the MERGE process you have shared? Also, you may want to investigate adding some self-diagnosis SAS commands to your second DATA step in order to clearly understand the results / benefits of your MERGE process -

PUTLOG _ALL_;

If you still have unusable output, re-post a reply with some SAS log results using the diagnosis command and all SAS-generated code for review/comment.


Scott Barry
SBBWorks, Inc.
New Contributor
Posts: 2

Re: Filling missing values.

Or like this:

proc sql;
create table test2 as
select a.Id,coalesce(b.value,0) as value, a.var2, a.var3 etc....
from test a left join test b
on a.id=b.id
and b.value ne .;
quit;

coalesce < if all values is missing then 0 as default !
Ask a Question
Discussion stats
  • 13 replies
  • 252 views
  • 0 likes
  • 5 in conversation