Hi all,
I have a dataset of laboratory results. Labs were collected at multiple time points. I am specifically interested in knowing if a patient tested negative at sequential time points. Specifically, if they tested negative twice or more in a row. Here's an example dataset, where neg1-neg5 is a yes/no (1/0) for a negative value. I do not want to reorder the dataset, as it is important to keep the results in the order in which they were received.
This is an example of what I have:
patid | neg1 | neg2 | neg3 | neg4 | neg5 |
1 | 0 | 1 | 0 | 0 | 1 |
2 | 1 | 1 | 1 | 0 | 0 |
3 | 0 | 1 | 1 | 0 | 0 |
4 | 0 | 1 | 1 | 1 | 1 |
5 | 0 | 0 | 0 | 1 | 0 |
And I would like something like this, where #seqneg is the number of sequential negatives by patient.
patid | neg1 | neg2 | neg3 | neg4 | neg5 | #seqneg |
1 | 0 | 1 | 0 | 0 | 1 | 0 |
2 | 1 | 1 | 1 | 0 | 0 | 3 |
3 | 0 | 1 | 1 | 0 | 0 | 2 |
4 | 0 | 1 | 1 | 1 | 1 | 4 |
5 | 0 | 0 | 0 | 1 | 0 | 0 |
Thank you!!
You apparently want the length of the longest sequence of '1', as long as that length is at least 2.
You could start by converting your sequence of numbers to a string (e.g. 0 1 1 1 0 becomes '01110'). You could then iteratively search STRNG for substrings '11111', '1111', '111', '11', '1', '' until you have success. Such a program would look like this:
data have;
infile cards truncover;
input patid neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
;
data want (drop=strng);
set have;
length strng $5;
strng=cats(of neg:);
do seqneg=5 to 1 by -1 until(find(strng,repeat('1',seqneg-1)));
end;
if seqneg=1 then seqneg=0;
run;
The "do seqneg=5 to 1 by -1 until (.....) stops when either the UNTIL condition resolves to a non-zero, or when seqneg reaches 1. The until is
until(find(strng,repeat('1',seqneg-1)));
which says search STRNG for the character "1" with SEQNEG-1 repetitions. If the FIND is unsuccessful, the function returns a zero, but otherwise returns the position in which the substring is found.
For a set of 10 NEG variables just change the length of STRNG to $10 and start the do loop at 10 instead of 5.
If this looks a little complicated then you could change the code to:
data want2 (drop=strng substrng found);
set have;
length strng $5 substrng $5;
strng=cats(of neg:);
do seqneg=5 to 1 by -1 until(found^=0);
substrng=repeat('1',seqneg-1);
found=find(strng,trim(substrng));
end;
if seqneg=1 then seqneg=0;
run;
which utilizes more intermediate variables.
regards,
Mark
And to specifically answer the question in your scenario, I would want #seqneg to be 2.
Hi @aeb @Reeza 's intuitive reasoning is brilliant, so the following is solely for your sample. Not asking you to rely on this , but my share of fun
data have;
infile cards truncover;
input patid neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
;
data want;
set have;
_k=cats(of neg:);
_f=findc(_k,'1');
_l=findc(_k,'1','b');
want= (countc(_k,'1')=_l-_f+1)*countc(_k,'1')*(_l-_f>0) ;
drop _:;
run;
You apparently want the length of the longest sequence of '1', as long as that length is at least 2.
You could start by converting your sequence of numbers to a string (e.g. 0 1 1 1 0 becomes '01110'). You could then iteratively search STRNG for substrings '11111', '1111', '111', '11', '1', '' until you have success. Such a program would look like this:
data have;
infile cards truncover;
input patid neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
;
data want (drop=strng);
set have;
length strng $5;
strng=cats(of neg:);
do seqneg=5 to 1 by -1 until(find(strng,repeat('1',seqneg-1)));
end;
if seqneg=1 then seqneg=0;
run;
The "do seqneg=5 to 1 by -1 until (.....) stops when either the UNTIL condition resolves to a non-zero, or when seqneg reaches 1. The until is
until(find(strng,repeat('1',seqneg-1)));
which says search STRNG for the character "1" with SEQNEG-1 repetitions. If the FIND is unsuccessful, the function returns a zero, but otherwise returns the position in which the substring is found.
For a set of 10 NEG variables just change the length of STRNG to $10 and start the do loop at 10 instead of 5.
If this looks a little complicated then you could change the code to:
data want2 (drop=strng substrng found);
set have;
length strng $5 substrng $5;
strng=cats(of neg:);
do seqneg=5 to 1 by -1 until(found^=0);
substrng=repeat('1',seqneg-1);
found=find(strng,trim(substrng));
end;
if seqneg=1 then seqneg=0;
run;
which utilizes more intermediate variables.
regards,
Mark
Mark and novinosrin:
Thank you both, very much.
novinosrin - you were right, code worked in specific scenarios but not all.
Mark - worked and I appreciate the explanation.
Thanks again!
If you want to stick with numbers and auto adjust to any number of neg values:
data have;
infile cards truncover;
input patid neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
;
data want;
set have;
array a dum1 neg: dum2;
seq = 0;
do i = 2 to dim(a);
if a{i} and not a{i-1} then start = i;
else if a{i-1} and not a{i} then do;
if i - 1 > start then
if i - start > seq then seq = i - start;
end;
end;
drop dum1 dum2 i start;
run;
Variables dum1 and dum2 pad the array and allow us to skip testing end conditions. /p>
An approach with transposition:
data c;
input PID neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
6 1 1 0 1 1
;
proc transpose data=c out=ct; var Neg: ; by PID;
run;
data cc; set ct; by PID;
if first.PID=1 then count=0; else;
if first.PID NE 1 AND Col1=1 AND lag(Col1)=1 then Count+1; else;
if Col1=0 then Count=0;
run;
proc sql;
select PID, max(Count) as Count
from CC
group by PID
;
quit;
If you skip the last line `IF COL1=0` it will continue the count: e.g., PID=6 will have the count of two rather than one.
Thinking outside the box, you can coerce an input statement to read a concatenation of the flag variables transformed to appear like an csv input line. Lots of tricks mashed into this answer:
* faux data with 32 neg flag variables;
data have; do patid = 1 to 20; array neg neg1-neg32; do over neg; neg = ranuni(123) < 0.35; end; output; end; run; data want; attrib longest_run length=8; set have; infile cards dlm=',' truncover; if _n_ = 1 then input @1 @@; * activate _infile_; _infile_ = translate(cats(of neg:),',','0'); * repurpose _infile_ value; * put _infile_; array one one1-one32; input @1 (one:) (binary32.:) @@; * read the _infile_ (input buffer); _error_ = 0; longest_run = log2 (1+max(of one:)); drop one:; cards; . run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.