Let's assume I have data like this:
data have; input state $ project_lenght_days; cards; success 9 failed 5 canceled 6 success 15 success 17 failed 12 failed 22 canceled 23 success 25 failed 27 ; run;
I would like to establish successful rate and present it as interval on chart 1-10, 10-20, 20-30 days.
I've found the same analyze on kaggle, but it's in R, code from the analyze is:
length.pct <- ksdata %>%
filter(state %in% c("successful", "failed"), length <= 61) %>%
group_by(length, state) %>%
summarize(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(length.pct[length.pct$state=="successful",], aes(length, pct)) +
geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") +
xlab("Project Length (Days)") + ylab("Success Rate (%)") +
scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") +
theme_economist() +
theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))
In my data I assume that canceled and failed state are equal to failed project.
Is it possible to make something like this in SAS with the sample Data i have?
I don't know who else will help with this (you're asking for R help on a SAS forum), but the code you posted is chock full of errors.
One, it uses `length` when there is no variable named `length` (`project_lenght_days`).
Two, there isn't a value called "successful" -- it's "success".
Third, I'm not really sure how the rate is being calculated. The order, I believe, matters in the `group_by` statement.
Fourth, you usually get a warning when you subset a data set like that when using ggplot:
ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)))
Fifth, the plot doesn't look anything like that. I would also typically categorize your variables in the mutate statement instead of relying on the plot options (probably worth it in the long run):
Sixth, you typically should call your libraries at the top of the script:
library(tidyverse) library(ggthemes) # origin of `theme_economist`
All this to say, I don't know if you provided enough information for us to replicate it in SAS.Here's the code I used to reproduce your example in R:
library(tidyverse)
library(ggthemes) ksdata <- tibble::tibble( state = c("success", "failed", "canceled", "success", "success", "failed", "failed", "canceled", "success", "failed"), project_lenght_days = c(9,5,6,15,17,12,22,23,25,27) ) length.pct <- ksdata %>% filter(state %in% c("success", "failed"), project_lenght_days <= 61) %>% group_by(project_lenght_days, state) %>% summarize(count=n()) %>% mutate(pct=count/sum(count)) length.pct ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)) + geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + xlab("Project Length (Days)") + ylab("Success Rate (%)") + scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + theme_economist() + theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))
And here's an equivalent query in SAS using PROC SQL, but I don't have enough information to determine whether it's correct or not.
proc sql;
create table length_pct as
select
state,
project_lenght_days,
count(*) as count_n,
case
when 0 <= project_lenght_days < 10 then "0 - 10"
when 10 <= project_lenght_days < 20 then "10 - 20"
else "20 - 30"
end as interval
from
have
where
state in ("success", "failed") and project_lenght_days <= 61
group by
state, project_lenght_days;
quit;
I'm not going to plot it, but you can look into PROC SGPLOT if we figure out all the issues.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/p1t32i8511t1gfn17sw07yxtazad.htm
Apologies if I'm missing something that you explained or if I made typo -- I was just trying to be clear in my explanation and attempting to reproduce across languages can be a bit difficult.
Hi, I wanted to create something what is in R, but converted to SAS. Maybe I didnt specify my quetion. Between 1-10 days theres is 1 success and 2 failed (failed + canceled = 2 failed) so the success rate between 1-10 days equals 1/3. Between 10-20 we have 2/3, between 20-30days we have 1/4. Dont know how to loop it properly to get results 1/3 or 2/3 or 1/4.
The whole data i have has more than 5000 rows which have project_length_days between 1 and 60.
Thanks.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.