BookmarkSubscribeRSS Feed
moteku
Fluorite | Level 6

Let's assume I have data like this:

 

data have;
input state $ project_lenght_days;
cards;
success 9
failed 5
canceled 6
success 15
success 17
failed 12
failed 22
canceled 23
success 25
failed 27
;
run;

 

I would like to establish successful rate and present it as interval on chart 1-10, 10-20, 20-30 days. 

I've found the same analyze on kaggle, but it's in R, code from the analyze is:

 

length.pct <- ksdata %>%
  filter(state %in% c("successful", "failed"), length <= 61) %>%
  group_by(length, state) %>%
  summarize(count=n()) %>%
  mutate(pct=count/sum(count))

ggplot(length.pct[length.pct$state=="successful",], aes(length, pct)) + 
  geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + 
  xlab("Project Length (Days)") + ylab("Success Rate (%)") + 
  scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))

In my data I assume that canceled and failed state are equal to failed project. 

Is it possible to make something like this in SAS with the sample Data i have?

przyklad.png

2 REPLIES 2
maguiremq
SAS Super FREQ

I don't know who else will help with this (you're asking for R help on a SAS forum), but the code you posted is chock full of errors.

 

One, it uses `length` when there is no variable named `length` (`project_lenght_days`).

 

Two, there isn't a value called "successful" -- it's "success".

 

Third, I'm not really sure how the rate is being calculated. The order, I believe, matters in the `group_by` statement.

 

Fourth, you usually get a warning when you subset a data set like that when using ggplot:

 

ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)))

Fifth, the plot doesn't look anything like that. I would also typically categorize your variables in the mutate statement instead of relying on the plot options (probably worth it in the long run):

maguiremq_1-1641472360969.png

 

Sixth, you typically should call your libraries at the top of the script:

 

library(tidyverse) 
library(ggthemes) # origin of `theme_economist`

All this to say, I don't know if you provided enough information for us to replicate it in SAS.Here's the code I used to reproduce your example in R:

 

library(tidyverse)
library(ggthemes) ksdata <- tibble::tibble( state = c("success", "failed", "canceled", "success", "success", "failed", "failed", "canceled", "success", "failed"), project_lenght_days = c(9,5,6,15,17,12,22,23,25,27) ) length.pct <- ksdata %>% filter(state %in% c("success", "failed"), project_lenght_days <= 61) %>% group_by(project_lenght_days, state) %>% summarize(count=n()) %>% mutate(pct=count/sum(count)) length.pct ggplot(length.pct[length.pct$state=="success",], aes(project_lenght_days, pct)) + geom_point(colour="royalblue4", size=2.5) + ggtitle("Success Rate vs. Project Length") + xlab("Project Length (Days)") + ylab("Success Rate (%)") + scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) + geom_vline(xintercept=30, colour="red") + theme_economist() + theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"))

And here's an equivalent query in SAS using PROC SQL, but I don't have enough information to determine whether it's correct or not.

proc sql;
	create table 	length_pct as
		select
					state,
					project_lenght_days,
					count(*) as count_n,
					case 
						when 0 <= project_lenght_days < 10 then "0 - 10"
						when 10 <= project_lenght_days < 20 then "10 - 20"
						else "20 - 30"
					end as interval
		from
					have
		where
					state in ("success", "failed") and project_lenght_days <= 61
		group by
					state, project_lenght_days;
quit;

I'm not going to plot it, but you can look into PROC SGPLOT if we figure out all the issues.

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/grstatproc/p1t32i8511t1gfn17sw07yxtazad.htm

 

Apologies if I'm missing something that you explained or if I made typo --  I was just trying to be clear in my explanation and attempting to reproduce across languages can be a bit difficult.

moteku
Fluorite | Level 6

Hi, I wanted to create something what is in R, but converted to SAS. Maybe I didnt specify my quetion. Between 1-10 days theres is 1 success and 2 failed (failed + canceled = 2 failed) so the success rate between 1-10 days equals 1/3. Between 10-20 we have 2/3, between 20-30days we have 1/4. Dont know how to loop it properly to get results 1/3 or 2/3 or 1/4.

The whole data i have has more than 5000 rows which have project_length_days between 1 and 60. 
Thanks.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 309 views
  • 0 likes
  • 2 in conversation