Solved: Re: LOCF and retain statement

A_Kh · Posted 07-30-2021 08:53 AM

Hi all,

I need your assistance in resolving following issue:

Dataset given:

SUBJID OUTCOME

111 negative

111

222 positive

222

Expected result:

SUBJID OUTCOME

111 negative

222 positive

I've been trying it with RETAIN and FIRST and LAST VARIABLES combination, but somehow could not get the desired result.

Please share your experience and provide some guidance. Your support is highly appreciated!

Thank you!

r_behata · Posted 07-30-2021 10:19 AM

data  have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;


data want;
update have(obs=0) have;
by subjid;
output;
run;

View solution in original post

r_behata · Posted 07-30-2021 10:19 AM

data  have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;


data want;
update have(obs=0) have;
by subjid;
output;
run;

jimbarbour · Posted 07-30-2021 03:58 PM

The problem with RETAIN if used on Outcome is that when SAS reads the next Dataline, the prior value will be wiped out -- even if the new value is Missing (blank). Therefore an intermediate/work variable needs to be used that will hold only non-blank values. See example, below.

@ballardw's question is important to consider though. The example below will replace the previous Outcome if a different Outcome is read in that has the same SubjID. This is the typical case. However, if you do NOT want this behavior, you would have to modify the code below, probably with FIRST./LAST. type processing. The code below assumes that the first occurrence of a given SubjID will have a non-blank Outcome.

Jim

data  have;
	input SUBJID $ OUTCOME : $20.;
	infile cards truncover;
datalines;
111 negative
111
111
222 positive
222
;
run;


DATA	Want;
	DROP	_:;
	RETAIN	_Prior_Outcome;
	SET	Have;
	IF	MISSING(Outcome)	THEN
		Outcome			=	_Prior_Outcome;
	ELSE
		_Prior_Outcome	=	Outcome;
RUN;

A_Kh · Posted 08-02-2021 05:17 PM

Hi @jimbarbour ,

Thank you for explanation, that's very helpful. Your code is just awesome! I haven't seen intermediate variables like yours before (not considering first/last.variables, i and arrays..). Would you please give me references where I could learn these variables.

As I already accepted the previous answer as solution I did not have a second chance to accept your code as the solution, but it did solve my problem. Thank you, again, for your time and guidance!

jimbarbour · Posted 08-02-2021 06:07 PM

Well, there's really not too much to learn:

For a numeric variable, just put it in your code. If SAS encounters a new variable and there is no definition, SAS will create the variable for you and use 8. (numeric, 8 bytes of storage) as the definition.
For a character variable, it's best to use a LENGTH statement otherwise SAS will use the first reference which can easily cause truncation. However, as I recall, if you're using a function to create the value at first reference, the default is $200. which isn't too bad. Still, I would eliminate this as a source of potential truncation and just use a LENGTH statement.
I usually prefix my intermediate variables with an underscore and then code a "DROP _:" which will drop all variables that start with an underscore, but be careful here. Some SAS procedures (Transpose, Compare, etc.) create variables that start with an underscore. You may or may not want to drop those.

Jim

A_Kh · Posted 08-02-2021 10:41 PM

The way you used DROP statement before set statement is something new that i learnt today. 😃
I usually use RETAIN before SET statement, DROP - at the very end or as a dataset options.. My confusion about intermediate variables started from there and I thought _: as a temporary (automatic) variable...

#1 and #2 are familiar to me.

jimbarbour · Posted 08-02-2021 11:24 PM

DROP is actually not executed during the run. DROP is processed at compile time. So, it really doesn't matter where you put the DROP statement in the program (as long as it is valid SAS syntax). I usually put the DROP near the top because that's typically where I put the SET, LENGTH, and DATA statements that create or bring in variables. That way I have all my variable related actions in close proximity for ease of reference.

Jim

A_Kh · Posted 08-02-2021 04:58 PM

Hi @r_behata ,

Thank you so much, it works perfectly!

ballardw · Posted 07-30-2021 12:52 PM

Do you have any cases where you have multiple values of outcome for any given Subjid such as this?

Dataset given:

SUBJID OUTCOME

111 negative

111

111 positive

111

If so what would be the desired result?

A_Kh · Posted 08-02-2021 05:08 PM

Hi @ballardw ,

Thank you for checking in with me. There is not any case where OUTCOME is different by SUBJID. And I see two codes above work perfectly for solving my problem.

mkeintz · Posted 08-02-2021 09:04 PM

The solution provide by @r_behata is great for your stated problem. But be aware of two issues:

you need an explicit output statement. Otherwise you would get only one observation per subjid:
Missing values of all variables, not just OUTCOME will be replaced by preceding non-missing values.

If #2 is a problem you want use a MERGE statement, as in:

data wantm;
  merge have (where=(outcome^=' ')) have (drop=outcome);
  by subjid;
run;

This applies the implicit RETAIN just to the outcome variable. Of course, it assumes you have only one non-missing instance of OUTCOME per SUBJID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

A_Kh · Posted 08-02-2021 10:24 PM

Hi @mkeintz,

I appreciate your feedback. Another good technique to solve my problem😃.

I'll keep in mind the behavior of explicit output statement while using Update statement, vs implicit output when using Merge statement.

Very helpful, thank you very much!

Registration is open

SAS Training: Just a Click Away