Hi,
I would like flag first record if there are 5 or more consecutive records by group after sorting. I have provided the data below and the code which I am trying but it is not working. Can you please correct me where I am doing wrong?
We need to sort by subjid parcat1 visit paramn. if consecutive values should be there for consecutive paramn values. If there is any consecutive paramn is missing in between 5 consecutive aval then it shouldn't be flagged. Thanks in advance!
data have;
input parcat1 $1-6 SUBJID $7-15 aval 16-17 paramn 18-21 visit $22-26;
datalines;
DAILY 10005501 0 301 V1
DAILY 10005501 1 302 V1
DAILY 10005501 0 303 V1
DAILY 10005501 1 304 V1
DAILY 10005501 0 305 V1
DAILY 10005501 2 306 V1
DAILY 10005501 2 307 V1
DAILY 10005501 2 308 V1
DAILY 10005501 0 309 V1
DAILY 10005501 2 310 V1
DAILY 10005501 0 311 V1
DAILY 10005501 0 312 V1
DAILY 10005501 1 313 V1
DAILY 10005501 0 314 V1
DAILY 10005501 1 315 V1
DAILY 10005501 2 316 V1
DAILY 10005501 1 317 V1
DAILY 10005501 2 318 V1
DAILY 10005501 0 319 V1
DAILY 10005501 0 320 V1
DAILY 10005501 0 321 V1
DAILY 10005501 0 322 V1
DAILY 10005501 2 323 V1
DAILY 10005501 0 324 V1
DAILY 10005501 0 346 V1
DAILY 10005501 0 347 V1
DAILY 10005501 0 348 V1
DAILY 10005501 0 349 V1
DAILY 10005501 0 350 V1
SOCIAL 10007501 1 301 V11
SOCIAL 10007501 2 303 V11
SOCIAL 10007501 2 304 V11
SOCIAL 10007501 0 305 V11
SOCIAL 10007501 0 306 V11
SOCIAL 10007501 2 307 V11
SOCIAL 10007501 0 308 V11
SOCIAL 10007501 0 309 V11
SOCIAL 10007501 2 310 V11
SOCIAL 10007501 0 311 V11
SOCIAL 10007501 0 312 V11
SOCIAL 10007501 2 313 V11
SOCIAL 10007501 0 314 V11
SOCIAL 10007501 0 315 V11
SOCIAL 10007501 0 316 V11
SOCIAL 10007501 0 317 V11
SOCIAL 10007501 0 319 V11
SOCIAL 10007501 0 320 V11
SOCIAL 10007501 0 321 V11
SOCIAL 10007501 0 322 V11
SOCIAL 10007501 0 323 V11
SOCIAL 10007501 0 325 V11
SOCIAL 10007501 0 326 V11
;
run;
data want;
input parcat1 $1-6 SUBJID $7-15 aval 16-17 paramn 18-21 visit $22-26 flag $28;
datalines;
DAILY 10005501 0 301 V1
DAILY 10005501 1 302 V1
DAILY 10005501 0 303 V1
DAILY 10005501 1 304 V1
DAILY 10005501 0 305 V1
DAILY 10005501 2 306 V1
DAILY 10005501 2 307 V1
DAILY 10005501 2 308 V1
DAILY 10005501 0 309 V1
DAILY 10005501 2 310 V1
DAILY 10005501 0 311 V1
DAILY 10005501 0 312 V1
DAILY 10005501 1 313 V1
DAILY 10005501 0 314 V1
DAILY 10005501 1 315 V1
DAILY 10005501 2 316 V1
DAILY 10005501 1 317 V1
DAILY 10005501 2 318 V1
DAILY 10005501 0 319 V1
DAILY 10005501 0 320 V1
DAILY 10005501 0 321 V1
DAILY 10005501 0 322 V1
DAILY 10005501 2 323 V1
DAILY 10005501 0 324 V1
DAILY 10005501 0 346 V1 Y
DAILY 10005501 0 347 V1
DAILY 10005501 0 348 V1
DAILY 10005501 0 349 V1
DAILY 10005501 0 350 V1
SOCIAL 10007501 1 301 V11
SOCIAL 10007501 2 303 V11
SOCIAL 10007501 2 304 V11
SOCIAL 10007501 0 305 V11
SOCIAL 10007501 0 306 V11
SOCIAL 10007501 2 307 V11
SOCIAL 10007501 0 308 V11
SOCIAL 10007501 0 309 V11
SOCIAL 10007501 2 310 V11
SOCIAL 10007501 0 311 V11
SOCIAL 10007501 0 312 V11
SOCIAL 10007501 2 313 V11
SOCIAL 10007501 0 314 V11
SOCIAL 10007501 0 315 V11
SOCIAL 10007501 0 316 V11
SOCIAL 10007501 0 317 V11
SOCIAL 10007501 0 319 V11 Y
SOCIAL 10007501 0 320 V11
SOCIAL 10007501 0 321 V11
SOCIAL 10007501 0 322 V11
SOCIAL 10007501 0 323 V11
SOCIAL 10007501 0 325 V11
SOCIAL 10007501 0 326 V11
;
run;
proc sort data=have; by subjid parcat1 visit paramn aval; run;
data want;
do m=1 by 1 until(last.aval);
set have;
by visit parcat1 subjid aval notsorted;
end;
do until(last.aval);
set have;
by visit parcat1 subjid aval notsorted;
if first.aval and m>4 and aval=0 then flag='Y';
output;
end;
run;
proc sort; by subjid parcat1 visit paramn aval; run;
Note: I posted this question last week with different logic now I have different logic.
Thanks,
An
proc sort data=have; by subjid parcat1 visit paramn aval; run; data want2; do m=1 by 1 until(last.aval); set have; by subjid parcat1 visit aval notsorted; if first.aval then do; x=paramn; n=1; count=1; end; else if paramn = sum(x,n) then do; n+1; count+1; end; else do; count=1; x=paramn; n=1; end; if count>4 then do; aa=x; bb=1; end; end; do until(last.aval); set have; by subjid parcat1 visit aval paramn notsorted; if paramn = aa and bb then flag='Y'; output; call missing(flag); end; drop count m n x aa bb; run;
I tried this, and it returns what youre looking for.
I believe the issue is that variables in the PDV are not set to missing until you reach the end of the data step. As such, the value of flag is not going to be reset until you after you exit the second do until loop.
Try doing
if first.aval and m>4 and aval=0 then flag='Y';
else flag='';
proc sort data=have; by subjid parcat1 visit paramn aval; run; data want2; do m=1 by 1 until(last.aval); set have; by subjid parcat1 visit aval notsorted; if first.aval then do; x=paramn; n=1; count=1; end; else if paramn = sum(x,n) then do; n+1; count+1; end; else do; count=1; x=paramn; n=1; end; if count>4 then do; aa=x; bb=1; end; end; do until(last.aval); set have; by subjid parcat1 visit aval paramn notsorted; if paramn = aa and bb then flag='Y'; output; call missing(flag); end; drop count m n x aa bb; run;
I tried this, and it returns what youre looking for.
I don't really like my approach, but I'll share it anyway in case it helps. Also I couldn't understand the role of AVAL, so I ignored it.
The code you posted had tabs in the CARDS data, here is a a version with spaces:
data have;
input parcat1 $1-6 SUBJID $7-15 aval 16-17 paramn 18-21 visit $22-26;
datalines;
DAILY 10005501 0 301 V1
DAILY 10005501 1 302 V1
DAILY 10005501 0 303 V1
DAILY 10005501 1 304 V1
DAILY 10005501 0 305 V1
DAILY 10005501 2 306 V1
DAILY 10005501 2 307 V1
DAILY 10005501 2 308 V1
DAILY 10005501 0 309 V1
DAILY 10005501 2 310 V1
DAILY 10005501 0 311 V1
DAILY 10005501 0 312 V1
DAILY 10005501 1 313 V1
DAILY 10005501 0 314 V1
DAILY 10005501 1 315 V1
DAILY 10005501 2 316 V1
DAILY 10005501 1 317 V1
DAILY 10005501 2 318 V1
DAILY 10005501 0 319 V1
DAILY 10005501 0 320 V1
DAILY 10005501 0 321 V1
DAILY 10005501 0 322 V1
DAILY 10005501 2 323 V1
DAILY 10005501 0 324 V1
DAILY 10005501 0 346 V1
DAILY 10005501 0 347 V1
DAILY 10005501 0 348 V1
DAILY 10005501 0 349 V1
DAILY 10005501 0 350 V1
SOCIAL 10007501 1 301 V11
SOCIAL 10007501 2 303 V11
SOCIAL 10007501 2 304 V11
SOCIAL 10007501 0 305 V11
SOCIAL 10007501 0 306 V11
SOCIAL 10007501 2 307 V11
SOCIAL 10007501 0 308 V11
SOCIAL 10007501 0 309 V11
SOCIAL 10007501 2 310 V11
SOCIAL 10007501 0 311 V11
SOCIAL 10007501 0 312 V11
SOCIAL 10007501 2 313 V11
SOCIAL 10007501 0 314 V11
SOCIAL 10007501 0 315 V11
SOCIAL 10007501 0 316 V11
SOCIAL 10007501 0 317 V11
SOCIAL 10007501 0 319 V11
SOCIAL 10007501 0 320 V11
SOCIAL 10007501 0 321 V11
SOCIAL 10007501 0 322 V11
SOCIAL 10007501 0 323 V11
SOCIAL 10007501 0 325 V11
SOCIAL 10007501 0 326 V11
;
run;
First I used a DATA step to create a counter of sequential values of paramn, within subjid-parcat1-visit. If there is a gap in paramn or if there is a change in subjid parcat1 visit, then I reset the counter to 1, and increment a seqid (which identifies a series of sequential values).
proc sort data=have;
by subjid parcat1 visit paramn ;
run;
data want1 ;
set have ;
by subjid parcat1 visit paramn ;
lagparamn=lag(paramn) ;
if first.visit then do ;
seq=1 ;
seqid++1 ;
end ;
else if NOT first.visit then do ;
if paramn-lagparamn=1 then seq++1 ;
else do ;
seq=1 ;
seqid++1 ;
end ;
end ;
run ;
proc print data=want1 ;
var subjid parcat1 visit paramn seqid seq;
run ;
This returns:
SUBJID parcat1 visit paramn seqid seq 10005501 DAILY V1 301 1 1 10005501 DAILY V1 302 1 2 10005501 DAILY V1 303 1 3 10005501 DAILY V1 304 1 4 10005501 DAILY V1 305 1 5 10005501 DAILY V1 306 1 6 10005501 DAILY V1 307 1 7 10005501 DAILY V1 308 1 8 10005501 DAILY V1 309 1 9 10005501 DAILY V1 310 1 10 10005501 DAILY V1 311 1 11 10005501 DAILY V1 312 1 12 10005501 DAILY V1 313 1 13 10005501 DAILY V1 314 1 14 10005501 DAILY V1 315 1 15 10005501 DAILY V1 316 1 16 10005501 DAILY V1 317 1 17 10005501 DAILY V1 318 1 18 10005501 DAILY V1 319 1 19 10005501 DAILY V1 320 1 20 10005501 DAILY V1 321 1 21 10005501 DAILY V1 322 1 22 10005501 DAILY V1 323 1 23 10005501 DAILY V1 324 1 24 10005501 DAILY V1 346 2 1 10005501 DAILY V1 347 2 2 10005501 DAILY V1 348 2 3 10005501 DAILY V1 349 2 4 10005501 DAILY V1 350 2 5 10007501 SOCIAL V11 301 3 1 10007501 SOCIAL V11 303 4 1 10007501 SOCIAL V11 304 4 2 10007501 SOCIAL V11 305 4 3 10007501 SOCIAL V11 306 4 4 10007501 SOCIAL V11 307 4 5 10007501 SOCIAL V11 308 4 6 10007501 SOCIAL V11 309 4 7 10007501 SOCIAL V11 310 4 8 10007501 SOCIAL V11 311 4 9 10007501 SOCIAL V11 312 4 10 10007501 SOCIAL V11 313 4 11 10007501 SOCIAL V11 314 4 12 10007501 SOCIAL V11 315 4 13 10007501 SOCIAL V11 316 4 14 10007501 SOCIAL V11 317 4 15 10007501 SOCIAL V11 319 5 1 10007501 SOCIAL V11 320 5 2 10007501 SOCIAL V11 321 5 3 10007501 SOCIAL V11 322 5 4 10007501 SOCIAL V11 323 5 5 10007501 SOCIAL V11 325 6 1 10007501 SOCIAL V11 326 6 2
Then can use PROC SQL to flag the records:
proc sql ;
create table want2 as
select *, seq=1 and max(seq)>=5 as Flag
from want1
group by seqid
order by subjid, parcat1, visit, paramn
;
quit ;
returns:
SUBJID parcat1 visit paramn seqid seq Flag 10005501 DAILY V1 301 1 1 1 10005501 DAILY V1 302 1 2 0 10005501 DAILY V1 303 1 3 0 10005501 DAILY V1 304 1 4 0 10005501 DAILY V1 305 1 5 0 10005501 DAILY V1 306 1 6 0 10005501 DAILY V1 307 1 7 0 10005501 DAILY V1 308 1 8 0 10005501 DAILY V1 309 1 9 0 10005501 DAILY V1 310 1 10 0 10005501 DAILY V1 311 1 11 0 10005501 DAILY V1 312 1 12 0 10005501 DAILY V1 313 1 13 0 10005501 DAILY V1 314 1 14 0 10005501 DAILY V1 315 1 15 0 10005501 DAILY V1 316 1 16 0 10005501 DAILY V1 317 1 17 0 10005501 DAILY V1 318 1 18 0 10005501 DAILY V1 319 1 19 0 10005501 DAILY V1 320 1 20 0 10005501 DAILY V1 321 1 21 0 10005501 DAILY V1 322 1 22 0 10005501 DAILY V1 323 1 23 0 10005501 DAILY V1 324 1 24 0 10005501 DAILY V1 346 2 1 1 10005501 DAILY V1 347 2 2 0 10005501 DAILY V1 348 2 3 0 10005501 DAILY V1 349 2 4 0 10005501 DAILY V1 350 2 5 0 10007501 SOCIAL V11 301 3 1 0 10007501 SOCIAL V11 303 4 1 1 10007501 SOCIAL V11 304 4 2 0 10007501 SOCIAL V11 305 4 3 0 10007501 SOCIAL V11 306 4 4 0 10007501 SOCIAL V11 307 4 5 0 10007501 SOCIAL V11 308 4 6 0 10007501 SOCIAL V11 309 4 7 0 10007501 SOCIAL V11 310 4 8 0 10007501 SOCIAL V11 311 4 9 0 10007501 SOCIAL V11 312 4 10 0 10007501 SOCIAL V11 313 4 11 0 10007501 SOCIAL V11 314 4 12 0 10007501 SOCIAL V11 315 4 13 0 10007501 SOCIAL V11 316 4 14 0 10007501 SOCIAL V11 317 4 15 0 10007501 SOCIAL V11 319 5 1 1 10007501 SOCIAL V11 320 5 2 0 10007501 SOCIAL V11 321 5 3 0 10007501 SOCIAL V11 322 5 4 0 10007501 SOCIAL V11 323 5 5 0 10007501 SOCIAL V11 325 6 1 0 10007501 SOCIAL V11 326 6 2 0
I'll look forward to better answers from other folks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.