BookmarkSubscribeRSS Feed
abaker_ca
Obsidian | Level 7

I have a simple problem that I can solve in R easily but just can't figure out how to solve in SAS.

 

I have a data set where there are numerous diagnostic results concatenated into one long string for each target being identified by the assay. 

 

I simply want to identify subjects that are positive for more than one target. 

 

Here is an example data set.

 

data test;
input subject_id $1-8
result $11 - 33;
datalines;
ABCD0001 A POS B POS C NEG D NEG
ABCD0002 A POS B NEG C NEG D NEG
ABCD0003 A NEG B NEG C NEG D NEG
ABCD0004 A POS B NEG C POS D NEG
;
run;

 

Ideally, I'd write code that would return this data set with a flag for subjects ABCD0001 and ABCD0004 because they are positive for more than one target in the 'result' variable. 

 

Thanks for anyone's help!

 

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hello @abaker_ca and welcome to the SAS Support Communities!

 

You can use the COUNT function to count the number of "POS" substrings in variable RESULT:

data want;
set test;
flag=count(result,'POS','i')>1;
run;

The optional 'i' modifier makes the search case-insensitive so that "pos", "Pos", etc. would be counted as well. The numeric variable FLAG will contain the value 1 or 0 if the inequality count(...)>1 is true or false, respectively.

Ksharp
Super User

To avoid this kind of scenario :

ABCD0001 APOSA POS BNEGB POS C NEG D NEG

 

You could try this one :

data test;
input subject_id $ result $20.;
datalines;
ABCD0001 A POS B POS C NEG D NEG
ABCD0002 A POS B NEG C NEG D NEG
ABCD0003 A NEG B NEG C NEG D NEG
ABCD0004 A POS B NEG C POS D NEG
;
run;
data want;
 set test; 
 if findw(result,'POS',' ') ne findw(result,'POS',' ','b') then flag='Y';
run;
Astounding
PROC Star
Could the actual target names include the letters "POS"? For example, could any of these words appear in a target name?

POST
OPPOSITE
POSE
POSTERIOR
SUPPOSE
abaker_ca
Obsidian | Level 7

It's unlikely they would. For the most part the string in this variable would be a disease target followed by the pos/neg call for the target (e.g. "Flu A POS"). 

Tom
Super User Tom
Super User

Count the occurrences of ' POS '.  You might need to append a space to the string if there is a value that completely fills the variable.

 

data want;
  set test;
  n_positive = count(result||' ',' POS ');
run;

Note that COUNT() has an optional third argument you can use to make the string comparisons case insensitive.

  n_positive = count(result||' ',' POS ','i');