BookmarkSubscribeRSS Feed
EloarL
Obsidian | Level 7

Could someone who comes from R language and SAS programming help me? I need to execute the code below programmatically in SAS. The code below was developed in R Spark on databricks. The goal is to run SAS through PROC, but as is little known in SAS Transfers. If anyone can help me with any suggestions.

 

6
for(i in 1:nrow(distinct_cluster)){
grupo=distinct_cluster[[i]]

dados<-dados_cluster
dados<-dados%>%filter(grupos==grupo)

variaveis_clusters <- dados %>%
filter(DATIPRNOTFSC<'2019-01-01', DATIPRNOTFSC>='2018-01-01')%>%
group_by(DESATICLI, Calculation)%>%
summarize(qtde=n())%>%
sdf_pivot(DESATICLI ~ Calculation, fun.aggregate = list(qtde = "sum"))

stats <- variaveis_clusters %>% select(-DESATICLI) %>% summarise_all(funs(avg, min, max)) %>% collect()

cols <- variaveis_clusters%>% select(-DESATICLI) %>% colnames()
avgs <- stats %>% select(ends_with("avg")) %>% unlist
mins <- stats %>% select(ends_with("min")) %>% unlist
maxs <- stats %>% select(ends_with("max")) %>% unlist

exprs <- glue("(`{cols}` - {avgs}) / ({maxs} - {mins})") %>%
setNames(cols) %>%
lapply(parse_quosure)

variaveis_escaladas<-variaveis_clusters %>% mutate(!!! exprs)

variaveis_clusters_tratado <- na.replace(variaveis_escaladas,0)
variaveis_clusters_tratado <- sdf_copy_to(sc, variaveis_clusters_tratado,
name = "variaveis_clusters_tratado", overwrite = TRUE)

set.seed(364221)

cluster <- ml_kmeans(variaveis_clusters_tratado, DESATICLI ~ ., k = 6)

clusters <- ml_predict(cluster, variaveis_clusters_tratado)

clusters <- clusters%>%mutate(grupos = case_when(
prediction==0 ~ "A",
prediction==1 ~ "B",
prediction==2 ~ "C",
prediction==3 ~ "D",
prediction==4 ~ "E",
prediction==5 ~ "F"))
if(i==1){
saida<-clusters%>%select(DESATICLI,grupos)%>%mutate(grupos_1=grupo)
}else{
saida<-sdf_bind_rows(saida,clusters%>%select(DESATICLI,grupos)%>%mutate(grupos_1=grupo))
}
}

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Could you explain what the code does?

 

Have you looked into the R-interface in SAS which is available through IML/Studio? https://documentation.sas.com/?docsetId=imlsstat&docsetTarget=imlsstat_statr_toc.htm&docsetVersion=1...

--
Paige Miller
Reeza
Super User
What exactly do you need help with?

Are you trying to convert the code to SAS code, to use SAS to pass the code to R/Spark or something else?

The code isn't actually bad at all, you need PROC MEANS mostly, possibly a transpose and data step but most of it is pretty straightforward data management processes. Note that because you have a SEED and randomness to some of the data you can't necessarily exactly replicate the full clustering process.

PS if databricks can export a PMML model certain SAS products can injest that and do the conversion. PMML is severely underutilized IMO.
EloarL
Obsidian | Level 7
I am trying to convert the code into SAS code for use in SAS GUIDE OR MINER.
Reeza
Super User
Ok. So what specfically do you need help with. I can outline some steps but will not do the coding for you - how familliar are you with SAS? Are you familiar with PROC MEANS and FREQ?
EloarL
Obsidian | Level 7
What I have with the frequency of code to execute in SAS refers to the command that was used at the beginning of the code. The command refers to executing the code 6 times for a quantity of 6 didnt groups. ie I don't know which command to use in SAS to perform retries n times. This is my difficulty.
Reeza
Super User

Get it running for one set and then we can help you convert it to run for all groups. 

 

There are generally several ways, two of the most common are BY group and macros. 

 

I'd say read up a bit on BY group processing in SAS - it's like GROUP BY in R, but think of it as being available to all your packages - including the cluster proc. 

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=n138da4gme3zb7n1nifpfhqv7clq.htm&docsetVe...

 

And another is macros, which are similar to functions except assume they return nothing just do a specific set of tasks.

https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

 


@EloarL wrote:
What I have with the frequency of code to execute in SAS refers to the command that was used at the beginning of the code. The command refers to executing the code 6 times for a quantity of 6 didnt groups. ie I don't know which command to use in SAS to perform retries n times. This is my difficulty.

 

 

 

 

EloarL
Obsidian | Level 7
I understand ... thank you very much for the suggestions.
ballardw
Super User

It may help to provide a small data set as SAS data step code and what the desired output would be from the given R code for that example data set. The data set needs to be just big enough or complex enough to exercise all of the options.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1167 views
  • 0 likes
  • 4 in conversation