Let's assume I have data like this:
data have; input state $ project_lenght_days; cards; success 9 failed 5 canceled 6 success 15 success 17 failed 12 failed 22 canceled 23 success 25 failed 27 ; run;
I would like to establish successful rate and present it as interval on chart 1-10, 10-20, 20-30 days.
I've found the same analyze on kaggle, but it's in R, code from the analyze is:
length.pct <- ksdata %>%
filter(state %in% c("successful", "failed"), length <= 61) %>%
group_by(length, state) %>%
summarize(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(length.pct[length.pct$state=="successful",], aes(length, pct)) +
geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") +
xlab("Project Length (Days)") + ylab("Success Rate (%)") +
scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") +
theme_economist() +
theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))
In my data I assume that canceled and failed state are equal to failed project.
Is it possible to make something like this in SAS with the sample Data i have?
I don't know who else will help with this (you're asking for R help on a SAS forum), but the code you posted is chock full of errors.
One, it uses `length` when there is no variable named `length` (`project_lenght_days`).
Two, there isn't a value called "successful" -- it's "success".
Third, I'm not really sure how the rate is being calculated. The order, I believe, matters in the `group_by` statement.
Fourth, you usually get a warning when you subset a data set like that when using ggplot:
ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)))
Fifth, the plot doesn't look anything like that. I would also typically categorize your variables in the mutate statement instead of relying on the plot options (probably worth it in the long run):
Sixth, you typically should call your libraries at the top of the script:
library(tidyverse) library(ggthemes) # origin of `theme_economist`
All this to say, I don't know if you provided enough information for us to replicate it in SAS.Here's the code I used to reproduce your example in R:
library(tidyverse)
library(ggthemes) ksdata <- tibble::tibble( state = c("success", "failed", "canceled", "success", "success", "failed", "failed", "canceled", "success", "failed"), project_lenght_days = c(9,5,6,15,17,12,22,23,25,27) ) length.pct <- ksdata %>% filter(state %in% c("success", "failed"), project_lenght_days <= 61) %>% group_by(project_lenght_days, state) %>% summarize(count=n()) %>% mutate(pct=count/sum(count)) length.pct ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)) + geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + xlab("Project Length (Days)") + ylab("Success Rate (%)") + scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + theme_economist() + theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))
And here's an equivalent query in SAS using PROC SQL, but I don't have enough information to determine whether it's correct or not.
proc sql;
create table length_pct as
select
state,
project_lenght_days,
count(*) as count_n,
case
when 0 <= project_lenght_days < 10 then "0 - 10"
when 10 <= project_lenght_days < 20 then "10 - 20"
else "20 - 30"
end as interval
from
have
where
state in ("success", "failed") and project_lenght_days <= 61
group by
state, project_lenght_days;
quit;
I'm not going to plot it, but you can look into PROC SGPLOT if we figure out all the issues.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/p1t32i8511t1gfn17sw07yxtazad.htm
Apologies if I'm missing something that you explained or if I made typo -- I was just trying to be clear in my explanation and attempting to reproduce across languages can be a bit difficult.
Hi, I wanted to create something what is in R, but converted to SAS. Maybe I didnt specify my quetion. Between 1-10 days theres is 1 success and 2 failed (failed + canceled = 2 failed) so the success rate between 1-10 days equals 1/3. Between 10-20 we have 2/3, between 20-30days we have 1/4. Dont know how to loop it properly to get results 1/3 or 2/3 or 1/4.
The whole data i have has more than 5000 rows which have project_length_days between 1 and 60.
Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.