SAS Support Communities

SBuc · ‎05-02-2019

Dear Group, I'm trying to work with a big dataset (>1 million lines) file where I have dateandtime data combined in DATETIME16. format. These datetime are corresponding to a variable named Creation_date. I am not used with that format. I want to split the variable Creation_date.in date_created and time_created (the 1st variable would be the date dd/mm/yyyy and the second would be the time). I've found some way to do it after manually importing the data using CARDS but this is not feasible with a large dataset. thank so much!

SBuc · ‎04-25-2019

Dear Brain trust, is there any specific SAS procedure to obtain the distance between two canadian zip code? I have a large dataset where I want to determine the distance between two distinct location. Thanks!

SBuc · ‎02-26-2019

Dear List, I am trying to implement Bayesian analysis in my research due to potential advantages such as direct inference with posterior estimates and possibility to incorpore prior knowledge when available. In my frequentist background when dealing with multiple covariates for explaining my dependent variable we used at least in my area of animal science: -> univariable analysis with screening of potentially interesting variables retaining variables with a potential interest after adjusting for other (let say P less than a threshold commonly 0.2...0 -> then do a manual stepwise procedure with all the covariates kept at the 1st step and reanalysing the model until all the remaining are <0.05 or any adjusted P<value. I know this approach has many pitfalls (multiple comparisons and risk of overfitting). My question was on the corresponding approach in a Bayesian framework. is there any advices or SAS tutorial on how to perform multiple regression analysis with Proc MCMC and especially for dealing with variables that are crossing the no-effect cut-off and final variable selection?

SBuc · ‎01-27-2019

Dear Brain trust, I want to determine the effect of a treatment on a score of pain in animals. Many published literature has used GLM models using score as a continuous dependent variable accounting then for treatment, block and animal repetition using linear mixed model. however, this approach is per se assuming that the score is a continuous variable (which is not) and also that a score of 8 is 2 times worst than a score of 4 which is not demonstrated (so the interval between 2 unit scale can not be assumed to be the same throughout the scale (0-15 max points). initially we used non parametric tests such as rank comparison but I want to know if a more elegant and powerful analysis can be performed. thanks

SBuc · ‎11-01-2018

Dear list, I am trying to predict a dichotomous variable using several covariates (2 continuous covariates, 1 dichotomous variable) using a random effect from clustering (ie animals are coming from different farms and farm (n=19; total of 280 individual data). I fit my model using proc GLIMMIX using a logit link. I am used with proc logistic diagnostics using area under curve and looking for quality of predictions in terms of sensitivity and specificity of the model for determining the prediction accuracy. I am not aware of these types of procedure in PROC GLIMMIX and especially if the same assumptions hold when we add a random effect to a logistic regression model. Any specific clue/guide to assess the accuracy of prediction in a GLMM?

SBuc · ‎11-01-2018

This comment is right. I wanted to predict age based on enzyme results from animals with repeated samples at various endpoints. the big challenge here is that the day of sampling per calf may change. As well as the interval between sampling. I was also thinking to do some Bayesian models since prediction is more easily obtained from posterior densities of the models but I am not very used with PROC MCMC.

SBuc · ‎10-29-2018

Dear Braintrust, I am looking for determining the rate of elimination of an enzyme present in the serum of calves. the samples have been taken approximately 7 days apart during the 1st month of life. the big issue is that I don't have the same number of samples per subject and that the interval between 2 samples in the same subject also varies. I want to be able to predict with associated error the possible ranges of age based on the enzyme value. I initially wanted to use PROC MIXED after log transformation of the enzyme to improve normality but definition of the covariance matrix appears difficult / impossible due to the difference interval between samples and variation of samples per subject. Maybe you could give me some advices / clues / papers referring to that specific type of problem? Thanks Type of dataset Id age_d enzyme_IU_L 111 1 800 111 8 90 111 18 25 234 3 1000 234 8 200 333 1 700 333 8 88 333 15 44 333 22 28

SBuc · ‎10-14-2018

Thank you very much, it works perfectly well.

SBuc · ‎10-13-2018

Thanks for this answer. I don't have access to the data base this week end so It's why I directly tried the code to generate data. Unfortunately my 9.4 version send me an error message for the "integer" within the rand function. when looking for this argument in sas book I see that rand is generally followed by a distribution type?

SBuc · ‎10-12-2018

Dear SAS community, I want to submit to your thoughts one problem. I have 2 datasets one from a cross-sectional study and another from a prospective cohort. I am more used with data analysis than internal sampling of my data… The 1st study is data obtained from calves between 1 to 21 days old. Each calf has only 1 data line (cross-sectional study 1 visit) The dataset I have is on the form: DATASET1 calfID FarmID Date_birth Date_visit Age Gender X1 X2 X3 111 FARM1 Male or female Where CalfD is the eartag number of the calf (unique for a specific calf), FarmID is farm identification code, Date_birth the date of calf_birth, Date_visit the day we measured X1,X2 and X2 which are continuous numeric data. Age is the difference between the 2 dates (which represents calf’s age). We’ve also collected gender information. The 2nd dataset is coming from different farms/ animal. The same data are collected but the same calf can be repeated 2 up to 3 times (extra column visit which indicate the visit number ofr a specific calf) as below (the interval between the visit is the same: 1 week). Calves have the same age range in the 2 datasets (from 1 day to 21 days). DATASET2 (calves are replicated during 2 to 3 visits one week apart) calfID FarmID Visit Date_birth Date_visit Age Gender X1 X2 X3 222 FARM2 1 222 FARM2 2 222 FARM2 3 223 FARM2 1 223 FARM2 2 333 FARM3 1 My objectives are to perform a logistic regression for predicting calves probability of being younger than X days (different age cut-off would be used) based on covariates Gender, X1, X2, X3 using farm as a random effect. I want to use information from both database. I want to sample the DATASET 2 to have only 1 sample per calf but also being able to select calves from the database based on the age distribution I want to have. For example if I have 115 calves from dataset 1 and 200 calves from dataset 2. I want to select calves (1 visit calf only) conditional on age characteristics (ex: having a median distribution of the age of calf sampled that I can specified). I therefore want to know if you have any clues on how sampling the DATASET2 to achieve my goals. I hope that this problem is clearly defined and can be solved with your expertise. If possible, in a second step I would be interested to make internal validation of my models using bootstrap samples of my 2 datasets (respecting 1 sample per calf). But I want to start by a more simple approach.

SBuc · ‎05-09-2018

Dear Listers, I want to analyse the following dataset. We have farms where we measured the success of a procedure (x success of n event). for every farm we have covariates (X1, ..., X9) which are defined as proportion of calves with a specific event in a specific farm (ex proportion of calves drinking the first meal less than 6 hours after birth..., so a proportion y events of m calves assessed). Now I have for different farms different m. I want to incorporate in my analysis the uncertainty around my covariates' proportion eg how accounting the fact that X1 can be 50% with some farms with 10 events of 20 trial vs 4 of 8 ... I don't know if it is possible accounting for this specificity.

SBuc · ‎02-18-2018

Thank you so much for your answer. it works with your example but when checking from this approach from a seminal paper the authors mention pct50 = 8.9 and pct90 = 4.4 the authors found a gamma(5.9, 0.7) when replacing these values and pct90 vs pct95 from your solution have this warning: The solution failed because 2 equations are missing or have extreme values for observation 1 at NEWTON iteration I don't know how to fix this properly.

SBuc · ‎02-16-2018

Thanks for this answer however I think my previous question was not clear enough. Let 's say that I want to modelize a gamma distribution. I only know that the 50th percentile of this gamma (alpha, beta) is pct50 and the 90th percentile is pct90. how can I find alpha and beta knowing pct50 and pct90 of the distribution? Thanks

SBuc · ‎02-16-2018

Dear Braintrust, we are working on prior determination for finding a gamma distribution. we have the 50th and 90th percentile of gamma distribution and want to find the parameters for defining the gamma distribution fitting our data. Is there any specific procedure we can follow with SAS? Thanks so much

SBuc · ‎02-09-2018

Thanks for this reply. In fact we have 55 farms where we have at least 12 or more calves (up to 20). we have an extra 20 farms where we have from 5 to 10 calves. since our inference would be at the farm level it is why we want if possible to stay at the farm level for all our analysis with the idea that what is good at the farm level is not necessarly exactly the same at the calf level.

Online Status	Offline
Date Last Visited	‎05-05-2019 10:22 AM

SAS Support Communities

How do I transform one varible with DATETIME16. format in 2 disctincts...

How to obtain the distance between 2 zip code (canada) ?

Multiple regression analysis with Proc MCMC

Testing for an effect of a treatment on an ordinal score beyond Kruska...

How to assess quality of prediction in a logistic regression with rand...

Re: repeated measures

repeated measures

Re: Resampling one data per subject where multiple observation are ava...

Re: Resampling one data per subject where multiple observation are ava...

Resampling one data per subject where multiple observation are availab...

How do I transform one varible with DATETIME16. format in 2 disctincts...

How to obtain the distance between 2 zip code (canada) ?

Multiple regression analysis with Proc MCMC

Testing for an effect of a treatment on an ordinal score beyond Kruska...

How to assess quality of prediction in a logistic regression with rand...

Re: repeated measures

repeated measures

Re: Resampling one data per subject where multiple observation are ava...

Re: Resampling one data per subject where multiple observation are ava...

Resampling one data per subject where multiple observation are availab...

covariates measured with various imprecision

Re: Finding a gamma distribution from percentiles

Re: Finding a gamma distribution from percentiles

Finding a gamma distribution from percentiles

Re: how accounting for covariate uncertainty (derived from a proportio...

Follow Us

What is...