BookmarkSubscribeRSS Feed

Plant disease incidence: Using proc logistic to model yes-no infection response

Started ‎08-06-2020 by
Modified ‎08-06-2020 by
Views 1,707

The year of 2020 ended up being the year of COVID-19 and the introduction of an invasive species to the US, Murder Hornets. These events are unfortunate not least because they stole headlines away from what was supposed to be the International Year of Plant Health, a designation the FAO set in 2018.


But plant diseases are major causes of crop loss ultimately translating to less food production. And viruses infect plants too (albeit different species than the ones that infect humans). However, many of the tools available for modeling plant virus infection are applicable to humans and vice versa. In particular, we consider a scenario of a ‘Yes-No’ infection outcome, with quantitative and qualitative explanatory factors.


Taking a dataset on Alfalfa mosaic virus from a 2013 publication, we model primary (localized) virus infection and its yes-no outcome using proc logistic.


The explanatory variables are host plant number (an ordinal ranking of host plant types with modulated susceptibility (0, 1, or 2, with 2 being the most susceptible) and virus stock dilution (higher dilution factor means less virus applied).

ods graphics on;
PROC LOGISTIC data=work.import plots=all;
   class Treatment DV1 DV2 Host_plant_no(param=ordinal)
       / param=ref ref=first;
   model Primary_Infection(event='1') = Host_plant_no Virus_stock_dilution
       / stb parmlabel clodds=pl orpvalue;
   oddsratio Host_plant_no / diff=all cl=pl;
   oddsratio Virus_stock_dilution / diff=all cl=pl;
   title ‘Host Plant and Dilution Impact on Infection';

After verifying convergence and significance of the test, we review the output:

Logistic virus genome.PNG

Of particular note is that the ‘c’ value (association of predicted probabilities and observed responses is 0.851, which represents a much better probability than a 0.500 of a fair coin toss.

Second, that we see odds ratios of higher degrees of infection based on host plant number. Plant number 2 has a 2.81 times greater change of being infected than plant 1, and a 3.97 times greater chance of becoming infected than plant 0. The response for host plant 1 versus 0 was not significant, but tended to be greater.


The next steps might include modeling stock dilution according to different polynomials (e.g. to assess whether there is a quadratic or cubic response), and to see if there is an interaction with host genome. It seems likely that there will be, given the that host plant responses differ dramatically, as do the rate response.


The source data for the publication was a Dryad dataset of the following citation: Sánchez-Navarro, Jesús A.; Zwart, Mark P.; Elena, Santiago F. (2013), Data from: Effects of the number of genome segments on primary and systemic infection for a multipartite plant RNA virus, Dryad, Dataset, The primary manuscript:

Version history
Last update:
‎08-06-2020 01:30 PM
Updated by:



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags