BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aeb
Calcite | Level 5 aeb
Calcite | Level 5

Hi all, 

I have a dataset of laboratory results. Labs were collected at multiple time points. I am specifically interested in knowing if a patient tested negative at sequential time points. Specifically, if they tested negative twice or more in a row. Here's an example dataset, where neg1-neg5 is a yes/no (1/0) for a negative value. I do not want to reorder the dataset, as it is important to keep the results in the order in which they were received. 

 

This is an example of what I have:

 

patidneg1neg2neg3neg4neg5
101001
211100
301100
401111
500010

 

And I would like something like this, where #seqneg is the number of sequential negatives by patient. 

patidneg1neg2neg3neg4neg5#seqneg
1010010
2111003
3011002
4011114
5000100

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You apparently want the length of the longest sequence of '1', as long as that length is at least 2. 

 

You could start by converting your sequence of numbers to a string (e.g. 0 1 1 1 0  becomes '01110').  You could then iteratively search STRNG for substrings '11111', '1111', '111', '11', '1', '' until you have success.  Such a program would look like this:

 

data have;
infile cards truncover;
input patid	neg1	neg2	neg3	neg4	neg5;
cards;
1	0	1	0	0	1
2	1	1	1	0	0
3	0	1	1	0	0
4	0	1	1	1	1
5	0	0	0	1	0
;

data want (drop=strng);
  set have;
  length strng $5;
  strng=cats(of neg:);

  do seqneg=5 to 1 by -1 until(find(strng,repeat('1',seqneg-1)));
  end;
  if  seqneg=1 then seqneg=0;
run;


 

 

 

The "do seqneg=5 to 1 by -1 until (.....) stops when either the UNTIL condition resolves to a non-zero, or when seqneg reaches 1.  The until is

 

until(find(strng,repeat('1',seqneg-1)));

 

 

which says search STRNG for the character "1" with SEQNEG-1 repetitions.   If the FIND is unsuccessful, the function returns a zero, but otherwise returns the position in which the substring is found.

 

For a set of 10 NEG variables just change the length of STRNG to $10 and start the do loop at 10 instead of 5.

 

If this looks a little complicated then you could change the code to:

 

data want2 (drop=strng substrng found);
  set have;
  length strng $5  substrng $5;
  strng=cats(of neg:);
  do seqneg=5 to 1 by -1 until(found^=0);
    substrng=repeat('1',seqneg-1);
    found=find(strng,trim(substrng));
  end;
  if  seqneg=1 then seqneg=0;
run;

which utilizes more intermediate variables.

 

regards,

Mark

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Reeza
Super User
What about this case?

patid neg1 neg2 neg3 neg4 neg5
6 1 1 0 1 1
aeb
Calcite | Level 5 aeb
Calcite | Level 5
Hi Reeza,

You are actually anticipating my next step. At this point I'm only interested in any occurrence of two or more - but I will eventually want to know if they were interrupted by a positive result.

I should also note, I have up to 10 results per patient (most patients do not have 10).

Thank you!


aeb
Calcite | Level 5 aeb
Calcite | Level 5

And to specifically answer the question in your scenario, I would want #seqneg to be 2.

novinosrin
Tourmaline | Level 20

Hi @aeb  @Reeza 's intuitive reasoning is brilliant, so the following is solely for your sample. Not asking you to rely on this , but my share of fun

 


data have;
infile cards truncover;
input patid	neg1	neg2	neg3	neg4	neg5;
cards;
1	0	1	0	0	1
2	1	1	1	0	0
3	0	1	1	0	0
4	0	1	1	1	1
5	0	0	0	1	0
;

data want;
set have;
_k=cats(of neg:);
_f=findc(_k,'1');
_l=findc(_k,'1','b');
want= (countc(_k,'1')=_l-_f+1)*countc(_k,'1')*(_l-_f>0) ;
drop _:;
run;

 

 

 

mkeintz
PROC Star

You apparently want the length of the longest sequence of '1', as long as that length is at least 2. 

 

You could start by converting your sequence of numbers to a string (e.g. 0 1 1 1 0  becomes '01110').  You could then iteratively search STRNG for substrings '11111', '1111', '111', '11', '1', '' until you have success.  Such a program would look like this:

 

data have;
infile cards truncover;
input patid	neg1	neg2	neg3	neg4	neg5;
cards;
1	0	1	0	0	1
2	1	1	1	0	0
3	0	1	1	0	0
4	0	1	1	1	1
5	0	0	0	1	0
;

data want (drop=strng);
  set have;
  length strng $5;
  strng=cats(of neg:);

  do seqneg=5 to 1 by -1 until(find(strng,repeat('1',seqneg-1)));
  end;
  if  seqneg=1 then seqneg=0;
run;


 

 

 

The "do seqneg=5 to 1 by -1 until (.....) stops when either the UNTIL condition resolves to a non-zero, or when seqneg reaches 1.  The until is

 

until(find(strng,repeat('1',seqneg-1)));

 

 

which says search STRNG for the character "1" with SEQNEG-1 repetitions.   If the FIND is unsuccessful, the function returns a zero, but otherwise returns the position in which the substring is found.

 

For a set of 10 NEG variables just change the length of STRNG to $10 and start the do loop at 10 instead of 5.

 

If this looks a little complicated then you could change the code to:

 

data want2 (drop=strng substrng found);
  set have;
  length strng $5  substrng $5;
  strng=cats(of neg:);
  do seqneg=5 to 1 by -1 until(found^=0);
    substrng=repeat('1',seqneg-1);
    found=find(strng,trim(substrng));
  end;
  if  seqneg=1 then seqneg=0;
run;

which utilizes more intermediate variables.

 

regards,

Mark

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
aeb
Calcite | Level 5 aeb
Calcite | Level 5

Mark and novinosrin:

PGStats
Opal | Level 21

If you want to stick with numbers and auto adjust to any number of neg values:

 

data have;
infile cards truncover;
input patid neg1    neg2    neg3    neg4    neg5;
cards;
1   0   1   0   0   1
2   1   1   1   0   0
3   0   1   1   0   0
4   0   1   1   1   1
5   0   0   0   1   0
;

data want;
set have;
array a dum1 neg: dum2;
seq = 0;
do i = 2 to dim(a);
	if a{i} and not a{i-1} then start = i;
	else if a{i-1} and not a{i} then do;
		if i - 1 > start then
			if i - start > seq then seq = i - start;
		end;
	end;
drop dum1 dum2 i start;
run;

Variables dum1 and dum2 pad the array and allow us to skip testing end conditions. /p>

PG
Haris
Lapis Lazuli | Level 10

An approach with transposition:

data c;
input PID neg1 neg2 neg3 neg4 neg5;
cards;
1 0 1 0 0 1
2 1 1 1 0 0
3 0 1 1 0 0
4 0 1 1 1 1
5 0 0 0 1 0
6 1 1 0 1 1
;
proc transpose data=c out=ct; var Neg: ; by PID;
run;

data cc; set ct; by PID;
	if first.PID=1									then count=0; else;
	if first.PID NE 1 AND Col1=1 AND lag(Col1)=1	then Count+1; else;
	if Col1=0										then Count=0;
run;

proc sql;
	select PID, max(Count) as Count
	from CC
	group by PID
;
quit;

If you skip the last line `IF COL1=0` it will continue the count: e.g., PID=6 will have the count of two rather than one.

RichardDeVen
Barite | Level 11

Thinking outside the box, you can coerce an input statement to read a concatenation of the flag variables transformed to appear like an csv input line.  Lots of tricks mashed into this answer:

 

  • DATA Step with SET and INFILE
  • Dummy DATALINES for INPUT to introduce _INFILE_
  • INFILE CARDS with TRUNCOVER to prevent LOST CARD
  • Held input with @@ to prevent dataline advancement
  • INPUT @1 to reposition parse start point
  • INPUT with (one:) variable list specifier to read into an array
  • INPUT with (binary32.:) informat list to read 1's as binary data upto next delimiter (,) using : list input modifier
  • MAX of one: to determine largest run of 1's
  • LOG2 of 1+MAX to get number of 1's in largest value that must be 2^p - 1
* faux data with 32 neg flag variables;

data have; do patid = 1 to 20; array neg neg1-neg32; do over neg; neg = ranuni(123) < 0.35; end; output; end; run; data want; attrib longest_run length=8; set have; infile cards dlm=',' truncover; if _n_ = 1 then input @1 @@; * activate _infile_; _infile_ = translate(cats(of neg:),',','0'); * repurpose _infile_ value; * put _infile_; array one one1-one32; input @1 (one:) (binary32.:) @@; * read the _infile_ (input buffer); _error_ = 0; longest_run = log2 (1+max(of one:)); drop one:; cards; . run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2111 views
  • 8 likes
  • 7 in conversation