12-01-2014 09:29 AM
Any of the correlations I would do for the character variables would be for Boolean (Y/N, dichotomous ) types of variables.
One solution I thought of was simply to add a 0 to every variable in my dataset. But surely there must be a better way?
Thank you in advance.
12-01-2014 10:19 AM
I feel that the brevity of your problem statement is leaving out some key pieces of information.
Why would you need to add zero to every variable in your dataset? How would that possibly help?
What is the point of computing a (Pearson) correlation between a numeric and character variable? That seems relatively meaningless.
12-01-2014 01:21 PM
Imagine a data-set with 100 variables. Many of them would be character or string variables with numeric codes in them. Assume they all had values of zero or 1.
The correlations would then be point-biserial correlations versus a continuous variable. But this is after each of the variables is converted from character to numeric. That is the part where if I add zero to each value it automatically takes care of it.
Thank you very much.
12-01-2014 02:34 PM
Are you saying that the values of the character variables are always '0' and '1'? I think some of us were confused by your statement to "add zero," since that technique will not work in geneal. For example, the SASHELP.Class data set has a binary character variable named SEX. If I try to add zero, I get an error:
b = sex + 0;
12-01-2014 04:14 PM
Yes, adding 0 would be an easy way to convert all character variables (whose values are always numbers) to numeric. Another way uses the INPUT function as illustrated in this sample program:
12-01-2014 04:29 PM
"Assume they all had values of zero or 1."
Sounds to me like you are implying the values are not zero or 1, especially since you originally spoke of Y/N or dichotomous (a very general term) ... but conceptually lets solve the problem when everything is zero or one. Which is fine if that's what you want to do, but the solution for the case where everything is 0 or 1 is not the solution when you have general dichotomous variables.
Or maybe I still don't understand.
12-01-2014 04:48 PM
I can't test this, but I think that proc reg will work with character variables that only contain numbers. E.g., I'd suggest trying the following test:
/*create some test data*/
data have (drop=_;
set sashelp.class (rename=(height=_height weight=_weight age=_age));
age=ifn(_age lt 14,'1','0');
height=ifn(_height le 60,'1','0');
weight=ifn(_weight lt 100,'1','0');
proc reg data=have;
eq1: model weight=height;
eq2: model weight=height age;
12-01-2014 05:32 PM
DN: Good point! I was surprised when it appeared to have worked and, obviously, it didn't work.
When I created the test file correctly using the ifc function, it totally failed.