Hi,
I have this querry...
i have a dataset like this and i want to flag it where i have consecutive values.
input subjid lbtestcd$ lbdy flag1$;
datalines;
101 alt 1 e
101 alt 2 .
101 alt 3 e
101 alt 4 e
101 ast 1 e
101 ast 3 e
101 ast 4 .
101 ast 5 e
102 alt 1 e
102 alt 2 .
102 alt 3 e
102 alt 4 e
102 ast 1 e
102 ast 3 e
102 ast 4 .
102 ast 5 e
102 ast 6 e
;
run;
i want an output like
i want to flag consecutive 'e' values for consecutive visits by subject and test.
could you please help me out
thanks.
hmm, something like this may work:
proc sort data = bob;by subjid lbtestcd lbdy;run;
data bob_1;
set bob;
by subjid lbtestcd lbdy;
prev_e = lag(flag1);
prev_visit = lag(lbdy);
if first.subjid then do;
prev_e = .;prev_visit = .;
end;
yes_prior = (prev_e = flag1);*flag that there is a prior 'e';
consecutive = ( lbdy - prev_visit = 1);*flag that there are consecutive visits;
what_you_want = (yes_prior = 1 & consecutive = 1);*the flag you are looking for, I think;
run;
Best of luck!
Anca.
Perhaps a simpler way to go:
proc sort data=bob;
by subjid lbtestcd lbdy;
run;
data want;
set bob;
by subjid lbtestcd flag1 notsorted;
if lbtestcd='e' and (first.flag1=0 or last.flag1=0) then new_flag='*';
run;
Depending on what you need, you could remove one condition:
if (first.flag1=0 or last.flag1=0) then new_flag='*';
Good luck.
Hello Rams,
Is the outcome below ("RESULTS") what you desire? I'm not sure I have understood the question.
data bob;
input subjid lbtestcd$ lbdy flag1$;
datalines;
101 alt 1 e
101 alt 2 .
101 alt 3 e
101 alt 4 e
101 ast 1 e
101 ast 3 e
101 ast 4 .
101 ast 5 e
102 alt 1 e
102 alt 2 .
102 alt 3 e
102 alt 4 e
102 ast 1 e
102 ast 3 e
102 ast 4 .
102 ast 5 e
102 ast 6 e
;
run;
data want(keep=subjid lbtestcd lbdy flag1 flag2);
if(1=2) then set bob;
declare hash f(ordered:"a",multidata:"y");
rc=f.defineKey("subjid", "lbtestcd", "flag2");
rc=f.defineData("subjid", "lbtestcd", "lbdy","flag1","flag2");
rc=f.defineDone();
length flag2 $1;
flag2="";
do until(done);
set bob end=done;
rcf=f.check();
if (missing(flag1)+rcf)=0 then do;
flag2="Y";
f.add();
end;
else if rcf ne 0 then f.add();
else if missing(flag1) then f.remove();
output;
call missing(of _all_);
end;
stop;
run;
RESULTS:
| subjid | lbtestcd | lbdy | flag1 | flag2 |
1 | 101 | alt | 1 | e |
|
2 | 101 | alt | 2 |
|
|
3 | 101 | alt | 3 | e |
|
4 | 101 | alt | 4 | e | Y |
5 | 101 | ast | 1 | e |
|
6 | 101 | ast | 3 | e | Y |
7 | 101 | ast | 4 |
|
|
8 | 101 | ast | 5 | e |
|
9 | 102 | alt | 1 | e |
|
10 | 102 | alt | 2 |
|
|
11 | 102 | alt | 3 | e |
|
Hello All,
Building on this solution, I wonder how to flag the subject ID where to have four consecutive values='e' other than at least two.
Thanks.
Ethan
data want;
set have;
prev_subjid=lag(subjid)
prev2_subjid=lag2(subjid);
prev3_subjid=lag3(subjid);
prev_flag1=lag(flag1);
prev2_flag1=lag2(flag1);
prev3_flag1=lag3(flag1);
if subjid=prev_subjid and subjid=prev2_subjid and subjid=prev3_subjid
and flag1='e' and prev_flag1='e' and prev2_flag1='e' and prev3_flag1='e'
then flag2='Y';
run;
Kind regards
Paul D.
My desired output data set with new variable named "newflag" shown below.
Thanks.
subjid lbtestcd$ lbdy flag1 newflag
101 alt 1 e 0
101 alt 2 . 0
101 alt 3 e 4
101 alt 4 e 4
101 ast 1 e 4
101 ast 3 e 4
101 ast 4 . 0
101 ast 5 e 2
102 alt 1 e 2
102 alt 2 .
102 alt 3 e 4
102 alt 4 e 4
102 ast 1 e 4
102 ast 3 e 4
102 ast 4 .
102 ast 5 e 2
102 ast 6 e 2
@t75wez1 :
This suggested output totally defies your verbal description in the original post. To wit, it's not "by subject and test" but merely by FLAG1.
Anyway, this should work:
data have ;
input subjid lbtestcd $ lbdy flag1 $ ;
cards ;
101 alt 1 e
101 alt 2 .
101 alt 3 e
101 alt 4 e
101 ast 1 e
101 ast 3 e
101 ast 4 .
101 ast 5 e
102 alt 1 e
102 alt 2 .
102 alt 3 e
102 alt 4 e
102 ast 1 e
102 ast 3 e
102 ast 4 .
102 ast 5 e
102 ast 6 e
run ;
data want ;
do _n_ = 1 by 1 until (last.flag1) ;
set have ;
by flag1 notsorted ;
newflag = sum (newflag, 1) ;
end ;
if cmiss (flag1) or _n_ < 2 then newflag = 0 ;
do _n_ = 1 to _n_ ;
set have ;
output ;
end ;
run ;
Apropos, I see no I/O logic vis-a-vis setting the NEWFLAG to 0 or blank (do you mean a missing value by that? kind of odd since the variable is numeric). Hence, whenever it's "non-consecutive", the code above sets it to 0.
Kind regards
Paul D.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.