BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Kh
Lapis Lazuli | Level 10

Hi all, 

 

I need your assistance in resolving following issue:

 

Dataset given:

SUBJID         OUTCOME

111                 negative

111

111

222                positive

222

 

Expected result:

SUBJID         OUTCOME

111                 negative

111                 negative

111                 negative

222                positive

222                positive

 

I've been trying it with RETAIN and FIRST and LAST VARIABLES combination, but somehow could not get the desired result. 

Please share your experience and provide some guidance. Your support is highly appreciated!

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
data  have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;


data want;
update have(obs=0) have;
by subjid;
output;
run;

View solution in original post

11 REPLIES 11
r_behata
Barite | Level 11
data  have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;


data want;
update have(obs=0) have;
by subjid;
output;
run;
jimbarbour
Meteorite | Level 14

The problem with RETAIN if used on Outcome is that when SAS reads the next Dataline, the prior value will be wiped out -- even if the new value is Missing (blank).  Therefore an intermediate/work variable needs to be used that will hold only non-blank values.  See example, below.

 

@ballardw's question is important to consider though.  The example below will replace the previous Outcome if a different Outcome is read in that has the same SubjID.  This is the typical case.  However, if you do NOT want this behavior, you would have to modify the code below, probably with FIRST./LAST. type processing.  The code below assumes that the first occurrence of a given SubjID will have a non-blank Outcome.  

 

Jim

 

data  have;
	input SUBJID $ OUTCOME : $20.;
	infile cards truncover;
datalines;
111 negative
111
111
222 positive
222
;
run;


DATA	Want;
	DROP	_:;
	RETAIN	_Prior_Outcome;
	SET	Have;
	IF	MISSING(Outcome)	THEN
		Outcome			=	_Prior_Outcome;
	ELSE
		_Prior_Outcome	=	Outcome;
RUN;

 

A_Kh
Lapis Lazuli | Level 10

Hi @jimbarbour , 

 

Thank you for explanation, that's very helpful. Your code is just awesome! I haven't seen intermediate variables like yours before (not considering first/last.variables, i and arrays..). Would you please give me references where I could learn these variables. 

As I already accepted the previous answer as solution I did not have a second chance to accept your code as the solution, but it did solve my problem. Thank you, again, for your time and guidance!

jimbarbour
Meteorite | Level 14

Well, there's really not too much to learn:

  1. For a numeric variable, just put it in your code.  If SAS encounters a new variable and there is no definition, SAS will create the variable for you and use 8. (numeric, 8 bytes of storage) as the definition.
  2. For a character variable, it's best to use a LENGTH statement otherwise SAS will use the first reference which can easily cause truncation.  However, as I recall, if you're using a function to create the value at first reference, the default is $200. which isn't too bad.  Still, I would eliminate this as a source of potential truncation and just use a LENGTH statement.
  3. I usually prefix my intermediate variables with an underscore and then code a "DROP _:" which will drop all variables that start with an underscore, but be careful here.  Some SAS procedures (Transpose, Compare, etc.) create variables that start with an underscore.  You may or may not want to drop those.

Jim

A_Kh
Lapis Lazuli | Level 10
The way you used DROP statement before set statement is something new that i learnt today. 😃
I usually use RETAIN before SET statement, DROP - at the very end or as a dataset options.. My confusion about intermediate variables started from there and I thought _: as a temporary (automatic) variable...

#1 and #2 are familiar to me.
jimbarbour
Meteorite | Level 14

DROP is actually not executed during the run.  DROP is processed at compile time.  So, it really doesn't matter where you put the DROP statement in the program (as long as it is valid SAS syntax).  I usually put the DROP near the top because that's typically where I put the SET, LENGTH, and DATA statements that create or bring in variables.  That way I have all my variable related actions in close proximity for ease of reference.

 

Jim

A_Kh
Lapis Lazuli | Level 10

Hi @r_behata , 

 

Thank you so much, it works perfectly!

ballardw
Super User

Do you have any cases where you have multiple values of outcome for any given Subjid such as this?

 

Dataset given:

SUBJID OUTCOME

111 negative

111

111  positive

111

111

 

If so what would be the desired result?

 

 

A_Kh
Lapis Lazuli | Level 10

Hi @ballardw , 

 

Thank you for checking in with me. There is not any case where OUTCOME is different by SUBJID. And I see two codes above work perfectly for solving my problem. 

mkeintz
PROC Star

The solution provide by @r_behata is great for your stated problem.  But be aware of two issues:

  1. you need an explicit output statement.  Otherwise you would get only one observation per subjid:
  2. Missing values of all variables, not just OUTCOME will be replaced by preceding non-missing values.

If #2 is a problem you want use a MERGE statement, as in:

 

data wantm;
  merge have (where=(outcome^=' ')) have (drop=outcome);
  by subjid;
run;

This applies the implicit RETAIN just to the outcome variable.  Of course, it assumes you have only one non-missing instance of OUTCOME per SUBJID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
A_Kh
Lapis Lazuli | Level 10
Hi @mkeintz,

I appreciate your feedback. Another good technique to solve my problem😃.

I'll keep in mind the behavior of explicit output statement while using Update statement, vs implicit output when using Merge statement.

Very helpful, thank you very much!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4493 views
  • 13 likes
  • 5 in conversation