About Buzzy_Bee

Buzzy_Bee · ‎10-13-2021

That's perfect. It correctly splits the results into scores for just Patient 1 and then shows results for everyone else (excluding Patient 1's scores). Yes - you are correct about the smallish sample size. The data would only look at around 4 visits in total, and there would be around 10 patients. I was under the impression that t-tests do not have a minimum size exactly, as they're often used with small sample sizes? It's not for a publication of any sort, more just something to show managers in meetings when they want a comparison. The data is fairly normally distributed (even though my example over-exaggerates Patient 1's scores).

Buzzy_Bee · ‎10-13-2021

Hello, I have some basic data below showing PatientID, Visit and the Score the patient has for some metric such as heart rate or pulse (I just made this up as a simple example of what I'm doing). If PatientID 1 has a higher score than everyone else, it would be helpful to compare him with all the other patients. So I was hoping to create something like a TTEST that would show summary stats for Patient 1 including N, Mean, Median etc and then compare these results with the summary stats for everyone (so the overall mean, median etc) to see if his results are different to the overall group. So it would look something like: Variable N Mean Median Patient1 10 99 98 Everyone 100 66 55 Diff(1-2) 33 43 Here is the simple data, and I'm not sure how to compare only Patient1 with everyone else. Thanks for your help or any ideas about other ways I can do this. data have; input PatientID $ Visit Score; datalines; 1 1 22 1 2 44 2 1 63 2 2 20 3 1 48 3 2 61 ; run; title 'T-Test'; proc ttest data=have; class ?; var Score; run;

Buzzy_Bee · ‎09-09-2021

I'm not sure what you mean by this. Industry convention requires statistics to be shown in tables where each statistic has a vertical column that displays the results and the treatments are listed horizontally in the table. If I displayed it vertically, no one would be able to read it or use in their reports.

Buzzy_Bee · ‎09-09-2021

That only shows me the number of observations in the data set though, which Proc Mixed already does by default. What I mean is, I need the N statistic. So in the LSM table where it shows Treatment Drug A and the Estimate, DF etc, I need an N column in there that would tell me there were say 100 for Drug A, and 120 for Drug B. The Proc Means procedure shows N, so currently I'm creating a Proc Mixed followed by Proc Means to generate all of the statistics I need. But I wanted to create a nice, tidy table that shows them all together that I can easily copy and paste onto an MS Word document.

Buzzy_Bee · ‎09-08-2021

On this instructional document about PROC MIXED: https://www.pharmasug.org/proceedings/2021/SA/PharmaSUG-2021-SA-062.pdf They demonstrate the example below to create an output that shows Least Square Means and columns for Estimate, Standard Error, DF, Lower and Upper. I need all of those, but I also want the N statistic. Does anyone know if there is an option that can be added in so I can get N on this output please? Thanks. ods output lsmeans=pb_lsmean diffs=pb_lsdiff; proc mixed data=qlqc2 method=reml covtest empirical; by param; class subjid trt visit; model chg=base trt visit trt*visit; random intercept/ subject=subjid; repeated visit/ subject=subjid type=ar(1); lsmeans trt*visit/ cl pdiff; run;

Buzzy_Bee · ‎09-01-2021

Thank you, Balladw. That is indeed how I would have written that line of code (I use the new CAT() functions, such as the CATX() function that you suggested). Originally at university I was taught the pipe symbol (|) that you mentioned, prior to the introduction of the CAT() functions in SAS, but the exclamation is certainly the rarer symbol that I don't often see. I read once that it is considered to be an outdated SAS programming style to still use symbols instead of the CAT() functions. Does anyone have an opinion on that? I think SAS courses these days only teach the CAT() functions now from what I've seen.

Buzzy_Bee · ‎09-01-2021

Thanks again, Kurt. I knew I'd seen them, but I couldn't think where. These days I always use CAT(), CATS() etc, so I'd forgotten about !! that was used before the introduction of the new concatenation functions.

Buzzy_Bee · ‎09-01-2021

Could someone please remind me what !! means in SAS programming? I was viewing this page below where someone had offered an example that uses strip() followed by !! I remember learning this a few years ago, but I can't recall now what the double exclamation does when used after the strip function. Thanks!! Solved: SAS strip function doesn't work - SAS Support Communities data test; char=strip('ALTEPLASE INJECTION 2 MG ') !! 'Extra text'; run;

Buzzy_Bee · ‎01-18-2021

That's a good idea too. I hadn't thought of that.

Buzzy_Bee · ‎01-13-2021

Thanks for your suggestion. Their theory certainly aligns more with how a statistician would think. Leaving the distribution alone without specifying a minimum bound shows the Proc MI procedure has still ensured that measures of central tendency remain the same as the non-imputed variable in my data set. If I set the lower bound to 0, the mean is now a bit above the original mean, so I can see why Johnson and Young suggest bounds are not the best idea.

Buzzy_Bee · ‎01-13-2021

Thanks very much - I was expecting it to be called Lower Bound or something like that. I tried the Minimum=0 option that you suggested and that works well. It has produced a sensible looking normal distribution. It looks like there is a Maximum option also, but my Proc MI has automatically imputed up to the largest value so I didn't need to specify an upper bound.

Buzzy_Bee · ‎01-12-2021

I was using PROC MI (multiple imputation) to impute missing values in continuous variables such as weight, height and age. I'm using FCS Regression with my Proc MI. However, I noticed that while most values looked really good and created a nice distribution, a few of the values displayed as negative. After searching online, I found some posts on R and Python blogs where people talked about using a lower bound to set 0 as the lower bound for variables like age, weight and height, that can't plausibly take negative values. I was planning to add a lower bound, but I couldn't find any option in the PROC MI procedure that allows lower bounds to be set. Some people using other programming languages also tried tactics like: Set negative values to 0 using: if age<0 then age=0 Reverse the value using absolute so that -5 becomes 5: age = abs(age) Personally, I would have thought both might skew the distribution a bit? While I was searching for answers on the internet, I found this comment in a statistics textbook, which suggests you just leave the implausible values alone: "Intuitively speaking, it makes sense to round values or incorporate bounds to give plausible values. However, these methods has been shown to decrease efficiency and increase bias by altering the correlation or covariances between variables estimated during the imputation process. Additionally, these changes will often result in an underestimation of the uncertainly around imputed values. Remember imputed values are NOT equivalent to observed values and serve only to help estimate the covariances between variables needed for inference (Johnson and Young 2011)." Does anyone else have any thoughts about lower bounds, and if you've used these before in such situations, did it work out well? I couldn't find any similar posts on this website, but I'm sure it will be useful for others using imputation methods to read this and learn about how other people handled this. Thanks for your thoughts.

Buzzy_Bee · ‎01-08-2021

Perfect - that's exactly what I was looking for. The example I was following on the documention.sas.com website showed their option as "stat=percent" so I had tried changing it to stat=freq, but kept getting error messages and I couldn't find better examples on the documention.sas.com website. Your solution using scale=count is the correct option to use to get it display how I wanted it. Thank you.

Buzzy_Bee · ‎01-08-2021

I was trying to create histograms for the sashelp.cars data set, and I want price (MSRP) binned into 5 groups and I also want to display Origin and Type on the plot. However, my resulting display just isn't right and I'm not sure what I'm doing wrong. A quick proc freq shows that Sedans are by far the most common Type of car in the three car manufacturing Origins, but the stacked plot shows USA mostly has Sports cars. Sports is the least common in all regions so it seems my charts are somehow backwards. I don't have to use SGPanel if there is something better that you recommend, but I do want to be able to bin my continuous variable (price in this case). Thanks for your help 🙂 proc freq data=sashelp.cars; tables type*origin; run; proc sgpanel data=sashelp.cars; where type in('SUV','Sedan','Sports'); panelby origin; histogram msrp / group=type nbins=5; run;

Buzzy_Bee · ‎01-07-2021

Thank you - using the two-way Proc Freq that you suggested does produce the same result (the bottom line shows 74.2% survival for females and 18.89% for males). Survived Sex 0 1 Total 0 81 468 549 25.8 81.11 1 233 109 342 74.2 18.89 I read over the Python code the people had used on Kaggle and realised that they didn't even use a correlation technique; they've just used a pivot table and a cross tab and then created a title above it labelling it as "Survival Correlation by Sex." I was really confused because I know statistical packages can't create a real Pearson correlation coefficient unless the variables are continuous. But all the Kaggle people are actually doing is creating a table of percentages, which isn't a correlation at all 🙂 Thanks for your help.

Online Status	Offline
Date Last Visited	‎10-13-2021 04:24 PM

Re: Using Proc TTEST to compare summary statistics for one value again...

Using Proc TTEST to compare summary statistics for one value against a...

Re: Is there an option to add the N statistic onto PROC MIXED?

Re: Is there an option to add the N statistic onto PROC MIXED?

Is there an option to add the N statistic onto PROC MIXED?

Re: Meaning of !! after the strip() function

Re: Meaning of !! after the strip() function

Meaning of !! after the strip() function

Re: Using a lower bound for values imputed by PROC MI?

Re: Using a lower bound for values imputed by PROC MI?

Re: Using Proc TTEST to compare summary statistics for one value again...

Re: Is there an option to add the N statistic onto PROC MIXED?

Re: Meaning of !! after the strip() function

Re: Meaning of !! after the strip() function

Re: Meaning of !! after the strip() function

Re: Using a lower bound for values imputed by PROC MI?

Re: Using Proc TTEST to compare summary statistics for one value again...

Using Proc TTEST to compare summary statistics for one value against a...

Re: Is there an option to add the N statistic onto PROC MIXED?

Re: Is there an option to add the N statistic onto PROC MIXED?

Is there an option to add the N statistic onto PROC MIXED?

Re: Meaning of !! after the strip() function

Re: Meaning of !! after the strip() function

Meaning of !! after the strip() function

Re: Using a lower bound for values imputed by PROC MI?

Re: Using a lower bound for values imputed by PROC MI?

Re: Using a lower bound for values imputed by PROC MI?

Using a lower bound for values imputed by PROC MI?

Re: Proc SGPanel not displaying expected results (using sashelp.cars)

Proc SGPanel not displaying expected results (using sashelp.cars)

Re: PROC CORR producing the wrong result - suggestions please?