BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
glz
Fluorite | Level 6 glz
Fluorite | Level 6

Hello,

 

I have an example data in below, the column status (highlight with yellow) is what I want.

For each subject, once "Positive" occur in Test, then the status for this subject is "Positive";

If no Positive, there is "Negative" then the status for this subject is "Negative";

If no positive, there are "Unknown" and one other character string such as "<5" then the status is "<5";

if no positive, there are "Unknown" and many other character string, such as "<5" and "<10", then the status is "NA";

 

Can anyone help me with this problem?

 

Thanks.

 

 

Capture.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Here are your rules as I understand them;

 

For each ID

  1. If there is any test='Positive' then status='Positive'.
  2. Otherwise if there is any test='Negative' then status='Negative'
  3. Otherwise if all of the tests='Unknown' then status='Unknown'
  4. Otherwise if there is a mix of 'Unknown' and a single test value, then status=that test value
  5. Otherwise if there is more than one test value, the status='NA'

The program below has not been tested, because your picture of the data is an image, not text.  I'm not going to manually type in your data.

 

data want (drop=any_:  n_unknown n_tests status_1);
  set have (in=first_pass)
      have (in=second_pass);
  by id;
  retain status;
if 0 then status_1=status; /* Edited addition */ retain any_pos any_neg n_unknown n_tests status_1; if first.id then call missing(status,any_pos,any_neg,n_unknown,n_tests,status_1); if first_pass=1 then do; n_tests+1; if test='Positive' then any_pos=1; else if test='Negative' then any_net=1; else if test='Unknown' then n_unknown+1; else do; if status_1=' ' then status_1=test; else if status_1^=test then status_1='NA'; end; if any_pos=1 then status='Positive'; else if any_neg=1 then status='Negative'; else if n_unknown=n_tests then status='Unknown'; else status=status_1; end; if second_pass=1; run;

This works by passing through each ID twice

 

The first pass reads each record once.  It determines the presence of any Positive, or Negative tests, and it counts the number of 'Unknown' results, and maintains a STATUS_1 variable.  STATUS_1 will get the first measured result (i.e. not Negative, Positive, or Unknown), and it will be compared to all subsequent measured results.  If any of then are different then STATUS_1 will be set to 'NA'.

 

The desired STATUS variable is updated with each first pass record.

 

The second pass does nothing but inherit the value of STATUS determined in the first pass.

 

Edited change: I added the statement

   if 0 then status_1=status;

so that STATUS_1 would be a character variable matching the length of STATUS.

The condition "if 0" is never true, so the actual value of STATUS_1 is not changed.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

Here are your rules as I understand them;

 

For each ID

  1. If there is any test='Positive' then status='Positive'.
  2. Otherwise if there is any test='Negative' then status='Negative'
  3. Otherwise if all of the tests='Unknown' then status='Unknown'
  4. Otherwise if there is a mix of 'Unknown' and a single test value, then status=that test value
  5. Otherwise if there is more than one test value, the status='NA'

The program below has not been tested, because your picture of the data is an image, not text.  I'm not going to manually type in your data.

 

data want (drop=any_:  n_unknown n_tests status_1);
  set have (in=first_pass)
      have (in=second_pass);
  by id;
  retain status;
if 0 then status_1=status; /* Edited addition */ retain any_pos any_neg n_unknown n_tests status_1; if first.id then call missing(status,any_pos,any_neg,n_unknown,n_tests,status_1); if first_pass=1 then do; n_tests+1; if test='Positive' then any_pos=1; else if test='Negative' then any_net=1; else if test='Unknown' then n_unknown+1; else do; if status_1=' ' then status_1=test; else if status_1^=test then status_1='NA'; end; if any_pos=1 then status='Positive'; else if any_neg=1 then status='Negative'; else if n_unknown=n_tests then status='Unknown'; else status=status_1; end; if second_pass=1; run;

This works by passing through each ID twice

 

The first pass reads each record once.  It determines the presence of any Positive, or Negative tests, and it counts the number of 'Unknown' results, and maintains a STATUS_1 variable.  STATUS_1 will get the first measured result (i.e. not Negative, Positive, or Unknown), and it will be compared to all subsequent measured results.  If any of then are different then STATUS_1 will be set to 'NA'.

 

The desired STATUS variable is updated with each first pass record.

 

The second pass does nothing but inherit the value of STATUS determined in the first pass.

 

Edited change: I added the statement

   if 0 then status_1=status;

so that STATUS_1 would be a character variable matching the length of STATUS.

The condition "if 0" is never true, so the actual value of STATUS_1 is not changed.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
glz
Fluorite | Level 6 glz
Fluorite | Level 6

Hello mkeintz,

 

Thanks very much, your solution and idea is very valuable.

If we define status_1 and status as character value first, then it solved my question! otherwise it will cause error such as "invalid numeric data, "Unknown" ".

 

Thanks again.

 

hashman
Ammonite | Level 13

@glz: As @mkeintz has mentioned, it would be nice of you if you provided a step for creating you sample input data, thus ridding those kind enough to help you from the need to reconcoct it themselves (this what the first step below does). Having said that, try this:

data have (keep = id test) ;                                                                                                            
  array v [5  ] $8 _temporary_ ("Positive","Negative","Unknown","<5","<10") ;                                                           
  array t [8,4]  8 _temporary_                                                                                                          
  (3 3 1 1  3 3 2 2  1 1 2 2  1 1 4 4  3 3 4 4  3 3 4 5  3 3 3 3  1 2 3 1) ;                                                            
  do i = 1 to 8 ;                                                                                                                       
    ID = "A" || put (i,z3.) ;                                                                                                           
    do j = 1 to 4 ;                                                                                                                     
      Test = v [ t[i,j]] ;                                                                                                              
      output ;                                                                                                                          
    end ;                                                                                                                               
  end ;                                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do _n_ = 1 by 1 until (last.id) ;                                                                                                     
    set have ;                                                                                                                          
    by id ;                                                                                                                             
    array f[0:4] _f0-_f4 ;                                                                                                              
    f [find ("PNU5", char (compress (test, "<"), 1))] = 1 ;                                                                             
  end ;                                                                                                                                 
  do _n_ = 1 to _n_ ;                                                                                                                   
    set have ;                                                                                                                          
    if      _f1 then status = "Positive" ;                                                                                              
    else if _f2 then status = "Negative" ;                                                                                              
    else if _f3 then do ;                                                                                                               
      if      _f0 then status = "N/A" ;                                                                                                 
      else if _f4 then status = "<5" ;                                                                                                  
      else status = "Unknown" ;                                                                                                         
    end ;                                                                                                                               
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                      

  

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 3872 views
  • 1 like
  • 3 in conversation