Hello,
I have an example data in below, the column status (highlight with yellow) is what I want.
For each subject, once "Positive" occur in Test, then the status for this subject is "Positive";
If no Positive, there is "Negative" then the status for this subject is "Negative";
If no positive, there are "Unknown" and one other character string such as "<5" then the status is "<5";
if no positive, there are "Unknown" and many other character string, such as "<5" and "<10", then the status is "NA";
Can anyone help me with this problem?
Thanks.
Here are your rules as I understand them;
For each ID
The program below has not been tested, because your picture of the data is an image, not text. I'm not going to manually type in your data.
data want (drop=any_: n_unknown n_tests status_1);
set have (in=first_pass)
have (in=second_pass);
by id;
retain status;
if 0 then status_1=status; /* Edited addition */
retain any_pos any_neg n_unknown n_tests status_1;
if first.id then call missing(status,any_pos,any_neg,n_unknown,n_tests,status_1);
if first_pass=1 then do;
n_tests+1;
if test='Positive' then any_pos=1; else
if test='Negative' then any_net=1; else
if test='Unknown' then n_unknown+1; else
do;
if status_1=' ' then status_1=test;
else if status_1^=test then status_1='NA';
end;
if any_pos=1 then status='Positive'; else
if any_neg=1 then status='Negative'; else
if n_unknown=n_tests then status='Unknown'; else
status=status_1;
end;
if second_pass=1;
run;
This works by passing through each ID twice
The first pass reads each record once. It determines the presence of any Positive, or Negative tests, and it counts the number of 'Unknown' results, and maintains a STATUS_1 variable. STATUS_1 will get the first measured result (i.e. not Negative, Positive, or Unknown), and it will be compared to all subsequent measured results. If any of then are different then STATUS_1 will be set to 'NA'.
The desired STATUS variable is updated with each first pass record.
The second pass does nothing but inherit the value of STATUS determined in the first pass.
Edited change: I added the statement
if 0 then status_1=status;
so that STATUS_1 would be a character variable matching the length of STATUS.
The condition "if 0" is never true, so the actual value of STATUS_1 is not changed.
Here are your rules as I understand them;
For each ID
The program below has not been tested, because your picture of the data is an image, not text. I'm not going to manually type in your data.
data want (drop=any_: n_unknown n_tests status_1);
set have (in=first_pass)
have (in=second_pass);
by id;
retain status;
if 0 then status_1=status; /* Edited addition */
retain any_pos any_neg n_unknown n_tests status_1;
if first.id then call missing(status,any_pos,any_neg,n_unknown,n_tests,status_1);
if first_pass=1 then do;
n_tests+1;
if test='Positive' then any_pos=1; else
if test='Negative' then any_net=1; else
if test='Unknown' then n_unknown+1; else
do;
if status_1=' ' then status_1=test;
else if status_1^=test then status_1='NA';
end;
if any_pos=1 then status='Positive'; else
if any_neg=1 then status='Negative'; else
if n_unknown=n_tests then status='Unknown'; else
status=status_1;
end;
if second_pass=1;
run;
This works by passing through each ID twice
The first pass reads each record once. It determines the presence of any Positive, or Negative tests, and it counts the number of 'Unknown' results, and maintains a STATUS_1 variable. STATUS_1 will get the first measured result (i.e. not Negative, Positive, or Unknown), and it will be compared to all subsequent measured results. If any of then are different then STATUS_1 will be set to 'NA'.
The desired STATUS variable is updated with each first pass record.
The second pass does nothing but inherit the value of STATUS determined in the first pass.
Edited change: I added the statement
if 0 then status_1=status;
so that STATUS_1 would be a character variable matching the length of STATUS.
The condition "if 0" is never true, so the actual value of STATUS_1 is not changed.
Hello mkeintz,
Thanks very much, your solution and idea is very valuable.
If we define status_1 and status as character value first, then it solved my question! otherwise it will cause error such as "invalid numeric data, "Unknown" ".
Thanks again.
@glz: As @mkeintz has mentioned, it would be nice of you if you provided a step for creating you sample input data, thus ridding those kind enough to help you from the need to reconcoct it themselves (this what the first step below does). Having said that, try this:
data have (keep = id test) ;
array v [5 ] $8 _temporary_ ("Positive","Negative","Unknown","<5","<10") ;
array t [8,4] 8 _temporary_
(3 3 1 1 3 3 2 2 1 1 2 2 1 1 4 4 3 3 4 4 3 3 4 5 3 3 3 3 1 2 3 1) ;
do i = 1 to 8 ;
ID = "A" || put (i,z3.) ;
do j = 1 to 4 ;
Test = v [ t[i,j]] ;
output ;
end ;
end ;
run ;
data want (drop = _:) ;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by id ;
array f[0:4] _f0-_f4 ;
f [find ("PNU5", char (compress (test, "<"), 1))] = 1 ;
end ;
do _n_ = 1 to _n_ ;
set have ;
if _f1 then status = "Positive" ;
else if _f2 then status = "Negative" ;
else if _f3 then do ;
if _f0 then status = "N/A" ;
else if _f4 then status = "<5" ;
else status = "Unknown" ;
end ;
output ;
end ;
run ;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.