About Ujjawal

Ujjawal · ‎06-25-2015

What i really want to accomplish - Look at the macro below . In this case, it is selecting variables one by one and testing it as an independent variable in single linear regression model. I want the different combinations of variables to be used and testing it as independent variables in regression model. %macro regcomb ( input =, depvar =, vars=); %let n=%sysfunc(countw(&vars)); %do i=1 %to &n; %let val = %scan(&vars,&i); ods select none; ods output ParameterEstimates=Estimate&i; proc reg data = input; model &depvar. = &val.; run; data &output; set Estimate1 - Estimate&n; run; %mend;

Ujjawal · ‎06-25-2015

Hi Team, I have dates stored in character format. They look like - "15Jul2014". I need to convert it into 2014-Jul-15 i.e. YYYY-MMM-DD . Is there any easy way to convert it? Thanks in anticipation!

Ujjawal · ‎06-24-2015

Is there any way we can use variables in different combinations in a macro? For example, there are 3 variables - Var 1 Var2 Var3. Possible combinations i would like to make --- Var1 Var2 Var3 Var1 Var2 Var1 Var3 Var2 Var3 Var1 Var2 Var3 Order does not matter (Var1 Var2 Var3 and Var2 Var1 Var3 - Same).

Ujjawal · ‎06-24-2015

Thanks Xia. The Gini that PROC UNIVARIATE produces is a measure of statistical dispersion. Correct me if i am wrong? A low Gini coefficient indicates a more equal distribution, with 0 corresponding to complete equality. How it can be used to check linearity? How it can be used in modeling process to select important linear variables?

Ujjawal · ‎06-23-2015

There is a whitepaper for selecting important variables in a linear regression model. The URL of the whitepaper is http://support.sas.com/resources/papers/proceedings15/3242-2015.pdf . It explains gini coefficient can be used to check linearity in the model. And we can also rank variable based on their GINI coefficient. A higher Gini coefficient suggests a higher potential for the variable to be useful in a linear regression. If a numeric variable is high on IV Rank but low on Gini coefficient , it usually suggests a lack of linearity. My Question - Is it the gini coefficient derived from decision tree? Or it is related to Area under Curve (AUC) -- (Gini = 2*AUC- 1)? What is the exact calculation of this Gini Coefficient and how it can be used to check linearity? I googled a lot. What i got is it is used in economics theory to check inequality. Any help would be highly appreciated. Thanks!

Ujjawal · ‎06-22-2015

Thanks for your response. First I need to check the categories that need to be combined in an automated manner. In the example, i know which categories need to be collapsed. I want this to be implemented via code. In a raw data file, first i need to calculate column percentage of a categorical (character) variable and then decide the categories to be collapsed and then replace them with the combined category.

Ujjawal · ‎06-22-2015

Hi Team, Is there a way we can collapse the levels of categorical variables in an automated manner? One way i know is to calculate the % of events falling in each category and run cluster analysis on it. Can this be automated for multiple variables considering only event rate, not cluster analysis? For example, there is a variable called "Char A". I have calculated Event Rate for this variable, i.e. percentage of 1s appearing in dependent variable (let's say VarY). It is simply the mean of Y. Char A Event Rate A 49% B 67% C 2% D 87% E 4% F 3% Next step is to combine categories of C,E and F as they have almost similar event rate (let's say, variation within 5%). Method 2 : If the above methodology is complicated to automate, can we make it simple taking only the number of cases falling in Char A (not considering dependent variable). For example, if a categorical level contains atmost 5% observations, combine it with others which all have percentage less than 5%, Char A ColPct C 2% F 3% E 4% A 49% B 67% D 87% Combine categories of C,E and F as they have column percentage less than 5%. After using either of the 2 methodologies, we need to replace the levels with the combined category in a raw data file. The raw data file looks like below - Char A Y C 1 C 1 F 1 E 0 E 0 A 1 B 1 D 1 E 0 A 0 B 1 D 1 C 1 Any help would be highly appreciated! Thanks in anticipation!

Ujjawal · ‎06-15-2015

Thanks for your reply. Sometimes the code works with quotes. I don't understand it. Could you please provide me the link? I searched on sas support site, didn't find the exact link.

Ujjawal · ‎06-15-2015

I am using %IF %THEN and it works sometimes as expected. Sometimes it does not work as expected. The following code does not work when i put "mean" in quotes - options minoperator mlogic; %macro missing (method=); %if %lowcase(&method.) = "mean" or %lowcase(&method.) = "median" %then %do; %put type = "mean or median"; %end; %else %do; %put type = "zero"; %end; %mend; %missing (method=mean); When i replace the third line of code to : %if %lowcase(&method.) = mean or %lowcase(&method.) = median %then %do; , the code works.

Ujjawal · ‎06-04-2015

I am trying to create a macro that calculates number of levels (distinct categories) for all the character variables in a dataset. Currently, i am using PROC SQL to calculate distinct categories. If i have 100 variables, i have to run PROC SQL 100 times in a loop. Is there a better way to do it? Is "stackods" kind of option available in PROC SQL so that i can reshape my dataset easily? Sample Code - proc sql; create table abc as select count(distinct(sex)) as sex_levels from sashelp.class; quit; Sample Output - Variable N_Levels Sex 2 Age 6 It's just the sample code and output for 2 variables. I need to do it for all the character variables in a dataset. Note : All my categorical variables are stored as character variables (string) in my dataset. Thanks in anticipation!

Ujjawal · ‎05-28-2015

Thanks a ton. Very Helpful!

Ujjawal · ‎05-28-2015

Thanks! No, i don't want any interaction between variables. It's a dummy variable with K-1 coding. Setting one value as a reference category. And then evaluating significance of each categories of a variable.

Ujjawal · ‎05-28-2015

I want PROC REG to be run if a condition satisfies. For example, I have a table that stores variable names and variance (VIF). If any variable has variance value greater than 5, i want PROC REG to be run on different dataset (input data set), else exit (out of the macro). Variable Variance A 3.75 B 4.5 C 10 D 11 E 13

Ujjawal · ‎05-28-2015

It's a marketing (churn) model. Most of the significant variables are continuous and only two character variables are appearing and they make sense in terms of business logic and statistical significance. So i was just checking their significance so i put them in CLASS statement with PARAM = REF option. And run stepwise, some levels are coming out insignificant at 5% level, even 10% level. SO i thought better to ignore these categories (levels). But SAS does not check levels while selecting variables via STEPWISE or any selection technique. I guess it's better to ignore these levels and make model more parsimonious with low degree of freedom.

Ujjawal · ‎05-28-2015

You recommend backward or forward selection? I don't want to remove a variable. I want to remove that level from a variable. It may overestime / underestimate my predicted probability.

Online Status	Offline
Date Last Visited	‎02-09-2018 10:54 AM

Distance / Similarity with Events

Moving Average Forecasting

Loading code

Re: code to match merge

Re: code to match merge

Re: Binary Flags

Binary Flags

Use Different variable on clearing filter

Re: Length in CREATE DATA

Re: Length in CREATE DATA

Re: Binary Flags

Re: Binary Flags

Re: Binary Flags

Re: Length in CREATE DATA

Re: Conditional Retain on Multiple Variables

Maximum number of characters in a macro variable

Re: Column Percentage in bar

Re: Pick variables in different combinations

Change date format

Pick variables in different combinations

Re: Gini Coefficient - Variable Importance Measure

Gini Coefficient - Variable Importance Measure

Re: Collapse levels of a categorical variable

Collapse levels of a categorical variable

Re: %IF %THEN not working

%IF %THEN not working

Number of levels in character variables

Re: If Then Run PROC

Re: Proc Logistic for categorical variables

If Then Run PROC

Re: Proc Logistic for categorical variables

Re: Proc Logistic for categorical variables