About AvocadoRivalry

AvocadoRivalry · ‎03-07-2012

This is more of a conceptual question than a procedural question, so I apologize if this is not the appropriate place. My understanding of regression is somewhat limited, so I'm hoping that some discussion here can give me a clearer understanding of the PROC Logistic output and ultimately help me pick the ideal model combination I'm looking for. I am in the process of running logistic regression on a series of combinations to best predict an event (event = 1, no event = 0). Each combination can consider a variety of predictors (somewhere between 3 and 6), each of which have been binned based on previous analysis and expert judgment, resulting in a regression model that considers several polytomous categorical variables. My understanding is that in the Maximum Likelihood Estimates section of the output, the estimate is essentially the coefficient which reflects how the variable (or in this case, each bin) is related to the null hypothesis. I was under the impression that this null hypothesis was essentially the likelihood of an event occurring across all observations, ignoring any variables/bins (e.g.; if I have 1,000 observations, 100 of which are events, the null hypothesis would state that the probability of event occurring is 10%). The regression would then base the coefficients for each variable and bin in relation to this percentage. It appears that SAS does things differently though. Instead of using the entire population to define the control bin, SAS defaults to using the first bin I have defined (which is apparent in the output, where this bin does not show up and the Odds Ratio Estimates show the comparison of each bin vs. this control bin). In many cases, this bin is made up of the missing values, which varies greatly in size and proportion to the overall population depending on which variable is being analyzed. My question is two-fold: First, why is the first bin defined used as the control/null hypothesis. Intuitively, it would make more sense to me to have this defined as the entire population rather than comparing against one of the bins that is defined to ultimately be used in the model anyway. Maybe my understanding of the null hypothesis and the value on which coefficients are based is incorrect, so any clarification there would help. Secondly, assuming that the control bin does have to be assigned and cannot be the entire population, is there a better approach to account for the variety in size and proportion of the missing value buckets? Instead of using this as the control bin, what is a better grouping to use? Thanks in advance!

AvocadoRivalry · ‎02-03-2012

Easy enough. Using the preloadfmt / completetypes options in PROC Summary and then running that dataset through PROC Freq seems to get me exactly where I need. Thanks for the link!

AvocadoRivalry · ‎02-03-2012

I am trying to build a dataset that shows the cumulative % of "bads" across 10 different buckets using PROC FREQ. I'm running into a problem for the buckets where there are zero "bads." SAS outputs what I've labeled below as "Table 1" where what I need is "Table 2." Basically, the difference that I'm trying to solve is to have SAS output that 80% - 90% bucked and output the same Cumulative Bad % from the bucket above it. Are there any options in PROC FREQ to do this? If not, what is the best alternative approach? Thanks! TABLE 1: Percentile Bads Cum. Bad % 0 - 10% 5 20.8% 10% - 20% 6 45.8% 20% - 30% 2 54.2% 30% - 40% 1 58.3% 40% - 50% 3 70.8% 50% - 60% 1 75.0% 60% - 70% 2 83.3% 70% - 80% 2 91.7% 90% - 100% 2 100.0% TABLE 2: Percentile Bads Cum. Bad % 0 - 10% 5 20.8% 10% - 20% 6 45.8% 20% - 30% 2 54.2% 30% - 40% 1 58.3% 40% - 50% 3 70.8% 50% - 60% 1 75.0% 60% - 70% 2 83.3% 70% - 80% 2 91.7% 80% - 90% 0 91.7% 90% - 100% 2 100.0%

AvocadoRivalry · ‎10-04-2011

It seems like doing it in Excel is probably the most efficient approach, as this will only have to be done a few times. The transpose feature in Excel basically does exactly what I need, so no reason to over-complicate things.

AvocadoRivalry · ‎10-03-2011

That's the way I want the output to show up, but I want to read-in the file instead of creating a dataset in SAS. Using an infile statement or Proc Import will suffice. Will the fact that reading the data in before transposing it will result in mixed variable formatting cause problems?

AvocadoRivalry · ‎10-03-2011

I have a raw data file that contains data in row-form that I need as columns. The file has 23 columns and 61 rows as-is. The first two are identifiers, which I need to keep as they are. The third column contains each of the column names I need post-transposition and the rest of the columns contain the values that correspond with the post-transposition variable found in the third column of each row. I need to transpose this entire file so that instead of having 23 columns and 61 rows, it has 61 columns and 20 rows (the two identifiers and one variable name column would not be observations), but I don't think SAS will let me read the data and do it with a PROC TRANSPOSE because the formats of each column as-is is inconsistent. I've attached a sample of the data with a few rows as reference.

AvocadoRivalry · ‎10-03-2011

That could have been it. I did use PROC SQL to generate the bins_needed variable.

AvocadoRivalry · ‎10-03-2011

Thanks Shinnen1 - that was exactly what I was looking for. I wanted to change the original code as little as possible, as it was not written by me, so the %eval function is exactly what I needed. The problem I was having was creating a flexible value to define the last element in the array each time the macro runs and that syntax seems to work exactly as I need it to.

AvocadoRivalry · ‎09-30-2011

I have the following arrays I need defined, which worked fine with the original code: array T Ts1-Ts50; array Q Qs1-Qs50; Due to some code changes, there will not always be 50 values and the number of observations is being defined as a macro variable earlier in the program. I'm not sure if this is something specific to arrays or I'm just doing something silly when trying to call the macro variable in the name for the arrays, but hopefully this is a quick fix. Here is the full set of code that tries to reference the macro variable to replace the Ts50 or Qs50 with the number needed: data Default; set Default; if _N_ eq 1 then do; set Ts; set Quants; end; array T Ts1-Ts&bins_needed.; array Q Qs1-Qs&bins_needed.; if quick_ratio lt Qs1 then T_QUICK = Ts1; else if quick_ratio ge Qs&bins_needed. then T_QUICK = Ts&bins_needed.; else do; do i = 1 to (&bins_needed. - 1); if quick_ratio ge Q and quick_ratio lt Q[i+1] then T_QUICK = T; end; end; %end; drop Ts1-Ts&bins_needed. Qs1-Qs&bins_needed. i; run;

AvocadoRivalry · ‎09-01-2011

I'm currently installing SAS from the Deployment Wizard on a new computer and the install seems to be hung up on the Enterprise Guide. After asking me to swich CDs and starting this stage, it's been nearly three hours. Is this normal? Is there any way to see some sort of progress bar?

AvocadoRivalry · ‎08-31-2011

Hi all, I have a field called CLIENT_NAME that has several different entries that can change depending on the dataset I'm feeding in. Because there is room for variation, using a macro to define the CLIENT_NAME values I want to use won't work each time. Is there a way to use a DO statement to create a loop that will go through every single unique value for a given variable? For instance, if I have 26 Client Names in a given dataset, the DO statement would run through the associated block of code 26 times, ultimately producing 26 datasets. Thanks!

AvocadoRivalry · ‎07-13-2011

Thanks Ksharp - that was exactly the answer I needed! Nice and simple.

AvocadoRivalry · ‎07-11-2011

Hi all, I have a macro that runs each ID in a group through a series of tests. One of the result tables of the test is a 5-row, 2-column table that shows the result of the test and the number of observations in this particular ID that meet each possible result. The table looks like this: Result Count A 15 B 2 C 33 D 45 E 11 All works fine when I try this on the bigger IDs, which have all five groups, but when I run it on the smaller IDs which may only have some of the results, I'm having trouble consistently outputting the same result. Ultimately, what I'd like to do is have the above table populated all the time with all five rows and have a zero value populated in the case that any given result has no observations: What I want: Result Count A 15 B 0 C 33 D 45 E 11 What I'm getting: Result Count A 15 C 33 D 45 E 11 This is a common problem that I run into that I haven't quite figured out how to do consistently. I'm sure there are some pretty standard techniques to build a table template like what I'm looking for and then using some sort of conditional logic (or another approach) to fill in even when all five results aren't present. Thanks!

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Interpreting PROC Logistic Output - Understanding ML and Coefficients

Obtaining cum. freqs for all possible outcomes in TABLES variable for ...

Obtaining cum. freqs for all possible outcomes in TABLES variable for ...

Reading in raw data that needs transposition (rows in raw need to be c...

Reading in raw data that needs transposition (rows in raw need to be c...

Reading in raw data that needs transposition (rows in raw need to be c...

Calling Macro Variable to be used as part of Array Statement

Calling Macro Variable to be used as part of Array Statement

Calling Macro Variable to be used as part of Array Statement

How long should SAS Enterprise Guide take to install?

Interpreting PROC Logistic Output - Understanding ML and Coefficients

Obtaining cum. freqs for all possible outcomes in TABLES variable for ...

Obtaining cum. freqs for all possible outcomes in TABLES variable for ...

Reading in raw data that needs transposition (rows in raw need to be c...

Reading in raw data that needs transposition (rows in raw need to be c...

Reading in raw data that needs transposition (rows in raw need to be c...

Calling Macro Variable to be used as part of Array Statement

Calling Macro Variable to be used as part of Array Statement

Calling Macro Variable to be used as part of Array Statement

How long should SAS Enterprise Guide take to install?

Using a loop to perform a set of steps on each unique value for a give...

Creating table with 5 rows even if all 5 aren't filled out

Creating table with 5 rows even if all 5 aren't filled out