BookmarkSubscribeRSS Feed
ahhh
Obsidian | Level 7

I need help with this algorithm:

CALCULATING THE TIME BETWEEN THE FIRST POSITIVE TO HOW EVER LONG THEY REMAIN POSITIVE (BEFORE HAVING A NEGATIVE OR UNKNOWN/MISSING RESULT)

 

ID VISIT T0 RESULT
1 V1 07SEP2005 negative
1 v2 06oct2005 negative
1 v3 01apr2006 positive
1 v4 03sep2006 negative
1 v5 08sep2006 negative
2 v1 07sep2005 negative
2 v2 02oct2005 positive
2 v3 05nov2005 positive
2 v4 10nov2006 negative
3 v1 07SEP2005 negative
3 v2 03OCT2005 positive
3 v3 09nov2005
3 v4 10dec2005 negative
3 v5 03mar2006 positive
4 v1 07SEP2005 negative
4 v2 10SEP2005 negative
4 v3 07OCT2005 negative
4 v4 17nOV2005 negative
4 v5 27JAN2006 negative
5 v1 03mar2007 positive
5 v2 30mar2007 positive
5 v3 06jun2007 positive
5 v4 13mar2008 positive
6 v1 03feb2007 positive
6 v2 30jun2007 UNKNOWN
6 v3 06nov2007
6 v4 03mar2008 positive
7 v1 01jan2007 positive
7 v2 30jun2007 positive
7 v3 08nov2007
7 v4 01feb2008 positive
8 v1 07aug2005 negative
8 v2 07oct2005 negative
8 v3 15nov2005 positive
8 v4 01jan2006
9 v1 07jan2005 negative
9 v2 10mar2005 positive
9 v3 27OCT2005 UNKNOWN
9 v4 07dec2005 negative
9 v5 20JAN2006 negative

 

diff=t1-t2
BASICALLY WE FOLLOW FIRST POSITIVE RESULT UNTIL 1.)THERE IS A NEGATIVE RESULT OR 2.) TILL THE LAST IF THEY NEVER BECOME NEGATIVES OR 3.) THE NEXT RESULT IS MISSING OR UNKNOWN

1.) if an id has all negative results (and no positives) it has to be flagged (flag=1)(example: Id 4),
2.) if an id has all positive results (and no negatives) it has to be flagged (flag=2)(example: Id 5),
and diff= date of first positive minus date of last positive and changed_result=0
3.) if the results are like this-- 1- POSITIVE
2- POSITIVE
3- POSITIVE
4- (MISSING VALUE)
5- POSITIVE
then diff= dates of first positive and 3 rd positive need to be subtracted and changed_result=0
(similar situation when results are
1- POSITIVE
2- POSITIVE
3- (MISSING VALUE)
4- POSITIVE
then diff= dates of first positive and 2 rd positive need to be subtracted and changed_result=0
In other words, whenever resulTS are in (positive, missing) then diff=dates of first positive and the positive result right before the missing value
need to be subtracted and changed_result=0
4.) if the result are like this-- 1-negative
2-negative
3-positive
4-missing
then diff= missing value (cuz we have nothing to compare it to) and changed_result=0
In other words, if the second element while calculating diff is a missing value, then diff=missing and changed_result=0
5.)'unknown' values in result var should be treated same as missing values


Number of visits may vary for each id and sometimes 'results' might be missing/unknown (treat both the same way).
For each unique ID:
6.) When there is 'positive' test result, we need to look for next consecutive 'negative' result and- we need to calculate the difference between the dates of the first positive result and the date of the first negative result after the first positive result and a new variable 'changed_result' is set to 1

 


In the output only one row should be there for each id:


expected output:

ID diff changed_result t1 t2 FLAG
1 155 1 01apr2006 03sep2006
2 404 1 02oct2005 10nov2006
3 . 0 . . (CANT' BE CALCULATED AS THERE IS A MISSING RESULT AFTER FIRST POSITIVE)
4 . 0 . . 1 (ALL NEGATIVE RESULTS)
5 376 0 03mar2007 13mar2008 2 (ALL POSITIVE RESULTS)
6 . 0 . . (CANT' BE CALCULATED AS THERE IS A unknown RESULT AFTER FIRST POSITIVE)
7 180 0 01jan2007 30jun2007 (this is how long were definitely positive)
8 . 0 . . (CANT' BE CALCULATED AS THERE IS A MISSING RESULT AFTER FIRST POSITIVE)
9 . 0 . . (CANT' BE CALCULATED AS THERE IS A unknown RESULT AFTER FIRST POSITIVE)


I hope I can get some answers soon. I use BASE SAS V9
Thank you so much in advance for your help.

 

7 REPLIES 7
ahhh
Obsidian | Level 7

I have this code but- I need to collapse multiple rows for each ID into a single row with needed variables.

Please advise...

 

data y ;
set x;
by id;
retain phase pos_d neg_d ;
changed_result =0; flag=0; /* initial value */
if first.id then phase = 0;
if phase = 0 and result= 'Positive' then do; pos_d = t0; phase=1; end;
if phase = 1 and result in ('','unknown') then do; neg_d = t0; phase=2;diff = neg_d - pos_d; end;
if phase = 1 and result in ('Negative') then do; neg_d = t0; phase =2; diff = neg_d - pos_d; changed_result = 1;end;
else do; diff = .; changed_result = 0; flag=1; end;
if last.id and phase=0 then neg=1;
*output;
keep id pos_d neg_d diff changed_result flag neg;
run;

Reeza
Super User

Please post OUTPUT to match your sample data. 

ahhh
Obsidian | Level 7

This is the output I am expecting(plz refer the code i posted):

ID diff   changed_result          pos_d              neg_d                FLAG      neg  
1  155          1                         01apr2006 03sep2006
2  404          1                         02oct2005 10nov2006
3    .             0                             .                  . 
4    .             0                             .                  .                             1           1
5   376         0                          03mar2007 13mar2008               2 
6    .             0                             .                   . 
7   180         0                          01jan2007    30jun2007 
8    .             0                             .                   . 
9    .             0                              .                   . 

Ksharp
Super User
There are too many scenarios you need consider about.


data have;
infile cards truncover;
input ID VISIT $ T0 :date9. RESULT : $20.;
format t0 date9.;
cards;
1 V1 07SEP2005 negative
1 v2 06oct2005 negative
1 v3 01apr2006 positive
1 v4 03sep2006 negative
1 v5 08sep2006 negative
2 v1 07sep2005 negative
2 v2 02oct2005 positive
2 v3 05nov2005 positive
2 v4 10nov2006 negative
3 v1 07SEP2005 negative
3 v2 03OCT2005 positive
3 v3 09nov2005
3 v4 10dec2005 negative
3 v5 03mar2006 positive
4 v1 07SEP2005 negative
4 v2 10SEP2005 negative
4 v3 07OCT2005 negative
4 v4 17nOV2005 negative
4 v5 27JAN2006 negative
5 v1 03mar2007 positive
5 v2 30mar2007 positive
5 v3 06jun2007 positive
5 v4 13mar2008 positive
6 v1 03feb2007 positive
6 v2 30jun2007 UNKNOWN
6 v3 06nov2007
6 v4 03mar2008 positive
7 v1 01jan2007 positive
7 v2 30jun2007 positive
7 v3 08nov2007
7 v4 01feb2008 positive
8 v1 07aug2005 negative
8 v2 07oct2005 negative
8 v3 15nov2005 positive
8 v4 01jan2006
9 v1 07jan2005 negative
9 v2 10mar2005 positive
9 v3 27OCT2005 UNKNOWN
9 v4 07dec2005 negative
9 v5 20JAN2006 negative
;
run;
data x;
 set have;
 by id;
 retain t0_p found;
 flag=0;
 if first.id then call missing(t0_p,found);
 if result='positive' and not found then do;t0_p=t0;found=1;end;
 if id=lag(id) and lag(result)='positive' and result='negative' then do;
  flag=1;diff=t0-t0_p;
 end;
 
 if id ne lag(id) or result ne 'positive' then group+1;
 format t0_p date9.;
 drop found;
run;
data want;
n=0;
 do until(last.group);
  set x;
  by group;
   n+1;
  if first.group then first=t0;
  if result ne 'positive' then have=1;
 end;
 do until(last.group);
  set x;
  by group;
  if not have and n ne 1 and last.group=1 then diff=t0-first;
  output;
 end;
 drop first n;
run;

ahhh
Obsidian | Level 7

Thanks Ksharp. The output I am getting with your code has multiple rows per id and i need only one row per id.

 

This following code is working well except for id7 - the t1 and t2 values are off but the diff is calculated correctly. If possible please help fix this issue:

 

data test;
infile datalines truncover;
input id visit $ t0 date9. result $;
format t0 ddmmyy10.;
datalines;
1 V1 07SEP2005 negative
1 v2 06oct2005 negative
1 v3 01apr2006 positive
1 v4 03sep2006 negative
1 v5 08sep2006 negative
2 v1 07sep2005 negative
2 v2 02oct2005 positive
2 v3 05nov2005 positive
2 v4 10nov2006 negative
3 v1 07SEP2005 negative
3 v2 03OCT2005 positive
3 v3 09nov2005
3 v4 10dec2005 negative
3 v5 03mar2006 positive
4 v1 07SEP2005 negative
4 v2 10SEP2005 negative
4 v3 07OCT2005 negative
4 v4 17nOV2005 negative
4 v5 27JAN2006 negative
5 v1 03mar2007 positive
5 v2 30mar2007 positive
5 v3 06jun2007 positive
5 v4 13mar2008 positive
6 v1 03feb2007 positive
6 v2 30jun2007 UNKNOWN
6 v3 06nov2007
6 v4 03mar2008 positive
7 v1 01jan2007 positive
7 v2 30jun2007 positive
7 v3 08nov2007
7 v4 01feb2008 positive
8 v1 07aug2005 negative
8 v2 07oct2005 negative
8 v3 15nov2005 positive
8 v4 01jan2006
9 v1 07jan2005 negative
9 v2 10mar2005 positive
9 v3 27OCT2005 UNKNOWN
9 v4 07dec2005 negative
9 v5 20JAN2006 negative
; run;

data alg2;
format id diff changed_result t1 t2 flag;
keep id diff changed_result t1 t2 flag;
set test;
by id;
retain type ; /* 1=start negative, 2=start positive, 0=changed */
retain pos1d; /* 1st positive date */
retain flag; /* 1=all negative, 2=all positive, .=changed */
retain t2; /* date used to calc diff */
retain diff changed_result 0;
result = lowcase(result);

if first.id then do; diff=.;
if result = 'positive' then do; type=2; pos1d = t0; flag=2; end; else
if result = 'negative' then do; type=1; flag=1; end;
else do; type = 0; flag=.; end;

end; else

if result = 'positive' then do;
if type ne 2 /* 1st positive which is not 1st in ID */
then do; type=2; pos1d = t0; flag=.; end;
else if flag=2 then do;
diff=t0-pos1d; t2=t0;
end;
/* if type=2 no change need */
end; else
if result = 'negative' then do;
if type =2 and diff=. /* 1st negative following positive(s) */
then do; diff = t0 - pos1d; t2=t0; flag=.; changed_result=1; end;
/* if type is 1 or 0 no change need */
end;
else do;
if type =2 /* missing following positive(s) */
then do; pos1d=.; *t2=.; changed_result=0; flag=.; type=0; end;
end;

if last.id then do;
if flag=1 then do; diff=.; t1=.; t2=.; output; end; else
if flag=2 then do; diff=t0 - pos1d; t1=pos1d; t2=t0; output; end;
else do;
if diff > 0
then do; t1=pos1d; /* changed_result diff and t2 retained */ output; end;
else do; diff=.; t1=.; t2=.; output; end;
end;
end;
format t1 t2 date9.;
run;

ballardw
Super User

It helps to clarify requirements if show what the desired output for your example data would be.

Ksharp
Super User

That would be easy , just add one more data step.

 


data have;
infile cards truncover;
input ID VISIT $ T0 :date9. RESULT : $20.;
format t0 date9.;
cards;
1 V1 07SEP2005 negative
1 v2 06oct2005 negative
1 v3 01apr2006 positive
1 v4 03sep2006 negative
1 v5 08sep2006 negative
2 v1 07sep2005 negative
2 v2 02oct2005 positive
2 v3 05nov2005 positive
2 v4 10nov2006 negative
3 v1 07SEP2005 negative
3 v2 03OCT2005 positive
3 v3 09nov2005
3 v4 10dec2005 negative
3 v5 03mar2006 positive
4 v1 07SEP2005 negative
4 v2 10SEP2005 negative
4 v3 07OCT2005 negative
4 v4 17nOV2005 negative
4 v5 27JAN2006 negative
5 v1 03mar2007 positive
5 v2 30mar2007 positive
5 v3 06jun2007 positive
5 v4 13mar2008 positive
6 v1 03feb2007 positive
6 v2 30jun2007 UNKNOWN
6 v3 06nov2007
6 v4 03mar2008 positive
7 v1 01jan2007 positive
7 v2 30jun2007 positive
7 v3 08nov2007
7 v4 01feb2008 positive
8 v1 07aug2005 negative
8 v2 07oct2005 negative
8 v3 15nov2005 positive
8 v4 01jan2006
9 v1 07jan2005 negative
9 v2 10mar2005 positive
9 v3 27OCT2005 UNKNOWN
9 v4 07dec2005 negative
9 v5 20JAN2006 negative
;
run;
data x;
 set have;
 by id;
 retain t0_p found;
 flag=0;
 if first.id then call missing(t0_p,found);
 if result='positive' and not found then do;t0_p=t0;found=1;end;
 if id=lag(id) and lag(result)='positive' and result='negative' then do;
  flag=1;diff=t0-t0_p;
 end;
 
 if id ne lag(id) or result ne 'positive' then group+1;
 format t0_p date9.;
 drop found;
run;
data want;
n=0;
 do until(last.group);
  set x;
  by group;
   n+1;
  if first.group then first=t0;
  if result ne 'positive' then have=1;
 end;
 do until(last.group);
  set x;
  by group;
  if not have and n ne 1 and last.group=1 then diff=t0-first;
  output;
 end;
 drop first n group have;
run;

data final_want;
 do until(last.id);
  set want;
  by id;
  if diff then found=1;
 end;
 do until(last.id);
  set want;
  by id;
  if diff then output;
 end;
 if not found then do;call missing(t0,t0_p);output;end;
 drop flag found result;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1032 views
  • 0 likes
  • 4 in conversation