Hi all,
I need your assistance in resolving following issue:
Dataset given:
SUBJID OUTCOME
111 negative
111
111
222 positive
222
Expected result:
SUBJID OUTCOME
111 negative
111 negative
111 negative
222 positive
222 positive
I've been trying it with RETAIN and FIRST and LAST VARIABLES combination, but somehow could not get the desired result.
Please share your experience and provide some guidance. Your support is highly appreciated!
Thank you!
data have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;
data want;
update have(obs=0) have;
by subjid;
output;
run;
data have;
input SUBJID $ OUTCOME :$20.;
infile cards truncover;
cards;
111 negative
111
111
222 positive
222
;
run;
data want;
update have(obs=0) have;
by subjid;
output;
run;
The problem with RETAIN if used on Outcome is that when SAS reads the next Dataline, the prior value will be wiped out -- even if the new value is Missing (blank). Therefore an intermediate/work variable needs to be used that will hold only non-blank values. See example, below.
@ballardw's question is important to consider though. The example below will replace the previous Outcome if a different Outcome is read in that has the same SubjID. This is the typical case. However, if you do NOT want this behavior, you would have to modify the code below, probably with FIRST./LAST. type processing. The code below assumes that the first occurrence of a given SubjID will have a non-blank Outcome.
Jim
data have;
input SUBJID $ OUTCOME : $20.;
infile cards truncover;
datalines;
111 negative
111
111
222 positive
222
;
run;
DATA Want;
DROP _:;
RETAIN _Prior_Outcome;
SET Have;
IF MISSING(Outcome) THEN
Outcome = _Prior_Outcome;
ELSE
_Prior_Outcome = Outcome;
RUN;
Hi @jimbarbour ,
Thank you for explanation, that's very helpful. Your code is just awesome! I haven't seen intermediate variables like yours before (not considering first/last.variables, i and arrays..). Would you please give me references where I could learn these variables.
As I already accepted the previous answer as solution I did not have a second chance to accept your code as the solution, but it did solve my problem. Thank you, again, for your time and guidance!
Well, there's really not too much to learn:
Jim
DROP is actually not executed during the run. DROP is processed at compile time. So, it really doesn't matter where you put the DROP statement in the program (as long as it is valid SAS syntax). I usually put the DROP near the top because that's typically where I put the SET, LENGTH, and DATA statements that create or bring in variables. That way I have all my variable related actions in close proximity for ease of reference.
Jim
Do you have any cases where you have multiple values of outcome for any given Subjid such as this?
Dataset given:
SUBJID OUTCOME
111 negative
111
111 positive
111
111
If so what would be the desired result?
Hi @ballardw ,
Thank you for checking in with me. There is not any case where OUTCOME is different by SUBJID. And I see two codes above work perfectly for solving my problem.
The solution provide by @r_behata is great for your stated problem. But be aware of two issues:
If #2 is a problem you want use a MERGE statement, as in:
data wantm;
merge have (where=(outcome^=' ')) have (drop=outcome);
by subjid;
run;
This applies the implicit RETAIN just to the outcome variable. Of course, it assumes you have only one non-missing instance of OUTCOME per SUBJID.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.