I don't know who else will help with this (you're asking for R help on a SAS forum), but the code you posted is chock full of errors.
One, it uses `length` when there is no variable named `length` (`project_lenght_days`).
Two, there isn't a value called "successful" -- it's "success".
Third, I'm not really sure how the rate is being calculated. The order, I believe, matters in the `group_by` statement.
Fourth, you usually get a warning when you subset a data set like that when using ggplot:
ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)))
Fifth, the plot doesn't look anything like that. I would also typically categorize your variables in the mutate statement instead of relying on the plot options (probably worth it in the long run):
Sixth, you typically should call your libraries at the top of the script:
library(tidyverse)
library(ggthemes) # origin of `theme_economist`
All this to say, I don't know if you provided enough information for us to replicate it in SAS.Here's the code I used to reproduce your example in R:
library(tidyverse) library(ggthemes)
ksdata <-
tibble::tibble(
state = c("success", "failed", "canceled", "success", "success", "failed", "failed", "canceled", "success", "failed"),
project_lenght_days = c(9,5,6,15,17,12,22,23,25,27)
)
length.pct <- ksdata %>%
filter(state %in% c("success", "failed"), project_lenght_days <= 61) %>%
group_by(project_lenght_days, state) %>%
summarize(count=n()) %>%
mutate(pct=count/sum(count))
length.pct
ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)) +
geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") +
xlab("Project Length (Days)") + ylab("Success Rate (%)") +
scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") +
theme_economist() +
theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))
And here's an equivalent query in SAS using PROC SQL, but I don't have enough information to determine whether it's correct or not.
proc sql;
create table length_pct as
select
state,
project_lenght_days,
count(*) as count_n,
case
when 0 <= project_lenght_days < 10 then "0 - 10"
when 10 <= project_lenght_days < 20 then "10 - 20"
else "20 - 30"
end as interval
from
have
where
state in ("success", "failed") and project_lenght_days <= 61
group by
state, project_lenght_days;
quit;
I'm not going to plot it, but you can look into PROC SGPLOT if we figure out all the issues.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/p1t32i8511t1gfn17sw07yxtazad.htm
Apologies if I'm missing something that you explained or if I made typo -- I was just trying to be clear in my explanation and attempting to reproduce across languages can be a bit difficult.
... View more