BookmarkSubscribeRSS Feed
moteku
Fluorite | Level 6

Let's assume I have data like this:

 

data have;
input state $ project_lenght_days;
cards;
success 9
failed 5
canceled 6
success 15
success 17
failed 12
failed 22
canceled 23
success 25
failed 27
;
run;

 

I would like to establish successful rate and present it as interval on chart 1-10, 10-20, 20-30 days. 

I've found the same analyze on kaggle, but it's in R, code from the analyze is:

 

length.pct <- ksdata %>%
  filter(state %in% c("successful", "failed"), length <= 61) %>%
  group_by(length, state) %>%
  summarize(count=n()) %>%
  mutate(pct=count/sum(count))

ggplot(length.pct[length.pct$state=="successful",], aes(length, pct)) + 
  geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + 
  xlab("Project Length (Days)") + ylab("Success Rate (%)") + 
  scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))

In my data I assume that canceled and failed state are equal to failed project. 

Is it possible to make something like this in SAS with the sample Data i have?

przyklad.png

2 REPLIES 2
maguiremq
SAS Super FREQ

I don't know who else will help with this (you're asking for R help on a SAS forum), but the code you posted is chock full of errors.

 

One, it uses `length` when there is no variable named `length` (`project_lenght_days`).

 

Two, there isn't a value called "successful" -- it's "success".

 

Third, I'm not really sure how the rate is being calculated. The order, I believe, matters in the `group_by` statement.

 

Fourth, you usually get a warning when you subset a data set like that when using ggplot:

 

ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)))

Fifth, the plot doesn't look anything like that. I would also typically categorize your variables in the mutate statement instead of relying on the plot options (probably worth it in the long run):

maguiremq_1-1641472360969.png

 

Sixth, you typically should call your libraries at the top of the script:

 

library(tidyverse) 
library(ggthemes) # origin of `theme_economist`

All this to say, I don't know if you provided enough information for us to replicate it in SAS.Here's the code I used to reproduce your example in R:

 

library(tidyverse)
library(ggthemes) ksdata <- tibble::tibble( state = c("success", "failed", "canceled", "success", "success", "failed", "failed", "canceled", "success", "failed"), project_lenght_days = c(9,5,6,15,17,12,22,23,25,27) ) length.pct <- ksdata %>% filter(state %in% c("success", "failed"), project_lenght_days <= 61) %>% group_by(project_lenght_days, state) %>% summarize(count=n()) %>% mutate(pct=count/sum(count)) length.pct ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)) + geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + xlab("Project Length (Days)") + ylab("Success Rate (%)") + scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + theme_economist() + theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))

And here's an equivalent query in SAS using PROC SQL, but I don't have enough information to determine whether it's correct or not.

proc sql;
	create table 	length_pct as
		select
					state,
					project_lenght_days,
					count(*) as count_n,
					case 
						when 0 <= project_lenght_days < 10 then "0 - 10"
						when 10 <= project_lenght_days < 20 then "10 - 20"
						else "20 - 30"
					end as interval
		from
					have
		where
					state in ("success", "failed") and project_lenght_days <= 61
		group by
					state, project_lenght_days;
quit;

I'm not going to plot it, but you can look into PROC SGPLOT if we figure out all the issues.

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/p1t32i8511t1gfn17sw07yxtazad.htm

 

Apologies if I'm missing something that you explained or if I made typo --  I was just trying to be clear in my explanation and attempting to reproduce across languages can be a bit difficult.

moteku
Fluorite | Level 6

Hi, I wanted to create something what is in R, but converted to SAS. Maybe I didnt specify my quetion. Between 1-10 days theres is 1 success and 2 failed (failed + canceled = 2 failed) so the success rate between 1-10 days equals 1/3. Between 10-20 we have 2/3, between 20-30days we have 1/4. Dont know how to loop it properly to get results 1/3 or 2/3 or 1/4.

The whole data i have has more than 5000 rows which have project_length_days between 1 and 60. 
Thanks.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 886 views
  • 0 likes
  • 2 in conversation