BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Annie_Fréchette
Obsidian | Level 7

Hi, I'm working with bacteria names and here is what I would like to do. Each animal may have up to 5 types of bacteria. I want to count all the staphylococci that are not aureus. My first step would surely be to separate the genus and species in two variables... but I'm not sure how.... Or is it possible to simply calculate all the sample that have "staphylococcus" as first word and NOT "aureus as the second word?

 

My data are sensitive but I,m gonna create a little example

 

In the end, I want to create a binary variable, if one of the 5 species of the cow

data r_mam;
input cow SP_1$ SP_2$ SP_3$ SP_4$ SP_5$;
cards;
1 Streptoccous dysgalactiae Klebsiella pneumoniae Staphylococcus chromogenes
2 Staphylococcus aureus Staphylococcus xylosus Escherichia coli
3 Streptococcus uberis
;
run;

was a non-aureus Staphylococci, the variable is 1.

 

Thank you very much for your help!!!

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Like this?

data cows;
   set work.r_mam;

   length all $ 200 sna 8;

   all = catx(' ', of sp_1-sp_5);

   sna = prxmatch('/staphylococcus (?!aureus)/i', all) and not prxmatch('/staphylococcus aureus/i', all);

   drop all;
run;

View solution in original post

17 REPLIES 17
PaigeMiller
Diamond | Level 26

@Annie_Fréchette wrote:

Hi, I'm working with bacteria names and here is what I would like to do. Each animal may have up to 5 types of bacteria. I want to count all the staphylococci that are not aureus. My first step would surely be to separate the genus and species in two variables... but I'm not sure how....


data r_mam;
	infile cards truncover;
	input cow SP_1 :$16. SP_2 :$16. SP_3 :$16. SP_4 :$16. SP_5 :$16.;
	cards;
1 Streptoccous dysgalactiae Klebsiella pneumoniae Staphylococcus chromogenes
2 Staphylococcus aureus Staphylococcus xylosus Escherichia coli
3 Streptococcus uberis
;
run;
data want;
	set r_mam;
	array a sp_1 sp_3 sp_5;
	array b sp_2 sp_4 sp_6;
	do i=1 to dim(a);
		if not missing(a(i)) then genus=a(i);
		if not missing(b(i)) then species=b(i);
		if not missing(a(i)) then output;
	end;
	drop sp_: i;
run; 
--
Paige Miller
Annie_Fréchette
Obsidian | Level 7

Hi Paige, thank but it is not doing the right thing... each variable SP_  contain these infos (genus specie)

I would like to have on one line Genus_1     specie_1    Genus_2 specie_2 etc....

 

Thanks!

andreas_lds
Jade | Level 19

The provided data-step does not work as expected: text is truncated ...

Is this as close as possible to what you have:

data r_mam;
   length
      cow 8
      SP_1-SP_5 $ 40
   ;
   infile datalines4 delimiter=';' missover;
   input cow SP_1 SP_2 SP_3 SP_4 SP_5;
   datalines4;
1;Streptoccous dysgalactiae;Klebsiella pneumoniae;Staphylococcus chromogenes
2;Staphylococcus aureus;Staphylococcus xylosus;Escherichia coli
3;Streptococcus uberis
;;;;
run;

 

Annie_Fréchette
Obsidian | Level 7

yes this is correct!

Cynthia_sas
SAS Super FREQ
The original data example just had spaces in the raw data lines. In the above example, each pair is separated by semi-colons. This tiny piece of information will impact how the data needs to be read. Can you explain how your data actually looks? Like your first example or like shown in @andreas_lds example (with semi-colons)?

Cynthia
Annie_Fréchette
Obsidian | Level 7

Hi! I just wanted to give you an example of what I'm dealing with. My data are from an access file and between the Genus and species ther is a space (in SP_1, SP_2 etc)...  Not sure how to explain better ... sorry!

andreas_lds
Jade | Level 19

@Annie_Fréchette wrote:

yes this is correct!


You can use scan to look at each word or regular expressions.

What do you expect as result? The cows with staphylococcus, but not aureus? Or just the count?

Annie_Fréchette
Obsidian | Level 7

What I want in the end is : if one of the Sp_1/SP_2/ etc was Staphylococci genus but Not an aureus species, SNA(a new binary variable)=1

andreas_lds
Jade | Level 19

Like this?

data cows;
   set work.r_mam;

   length all $ 200 sna 8;

   all = catx(' ', of sp_1-sp_5);

   sna = prxmatch('/staphylococcus (?!aureus)/i', all) and not prxmatch('/staphylococcus aureus/i', all);

   drop all;
run;
Annie_Fréchette
Obsidian | Level 7

Thank you very much! Can I ask you one more question? I don't see in the code how Sas know to attribute "1" for the SNA? Where it is write?

 

Thanks again!

Annie_Fréchette
Obsidian | Level 7

Hi @andreas_lds !

@Cynthia_sasraised up a good point... that is problematic at this moment with the code.... If a cow have a Staphylocooccus aureus AND a Staphylococcus xylosus I would lik my SNA=1.. presently this is not the case...

andreas_lds
Jade | Level 19

@Annie_Fréchette wrote:

Hi @andreas_lds !

@Cynthia_sasraised up a good point... that is problematic at this moment with the code.... If a cow have a Staphylocooccus aureus AND a Staphylococcus xylosus I would lik my SNA=1.. presently this is not the case...


Of course not, because you have defined the flag-variable differently. But you still want sna = 0, if you find Staph. aureus and, e.g. Staph. epidermis (don't know if that is possible)  are found? Please be precise! Maybe defining sna=0 is easier, if in understood all your posts, you want sna = 0 if

a) the only Staphylococcus found is aureus, or

b) no Staph. is found at all.

Right?

Annie_Fréchette
Obsidian | Level 7

Hi Andreas! Sorry if my toughts were not clear about I wanted. I used the code from @SASJedi and it worked as I needed!

 

Thanks agains for your help!

 

Annie

Cynthia_sas
SAS Super FREQ

Hi:

  What if a cow has THIS row of data with 2 types of Staphylococcus values, one aureus and one not:

2 Staphylococcus aureus Staphylococcus xylosus Escherichia coli

 

Then what would your binary variable look like? SNA=1 ?

 

  What about this?

9 Staphylococcus aureus Staphylococcus xylosus Staphylococcus chromogenes Escherichia coli

 

Would SNA=1 or would SNA=2?

 

Cynthia

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 1342 views
  • 3 likes
  • 5 in conversation