BookmarkSubscribeRSS Feed
Zachary
Obsidian | Level 7

Any of the correlations I would do for the character variables would be for Boolean (Y/N, dichotomous ) types of variables.

One solution I thought of was simply to add a 0 to every variable in my dataset. But surely there must be a better way?

Thank you in advance.

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I feel that the brevity of your problem statement is leaving out some key pieces of information.

Why would you need to add zero to every variable in your dataset? How would that possibly help?

What is the point of computing a (Pearson) correlation between a numeric and character variable? That seems relatively meaningless.

--
Paige Miller
Zachary
Obsidian | Level 7

Imagine a data-set with 100 variables. Many of them would be character or string variables with numeric codes in them. Assume they all had values of zero or 1.

The correlations would then be point-biserial correlations versus a continuous variable. But this is after each of the variables is converted from character to numeric. That is the part where if I add zero to each value it automatically takes care of it.

Thank you very much.

Rick_SAS
SAS Super FREQ

Are you saying that the values of the character variables are always '0' and '1'?  I think some of us were confused by your statement to "add zero," since that technique will not work in geneal. For example, the SASHELP.Class data set has a binary character variable named SEX. If I try to add zero, I get an error:

data A;

set sashelp.class;

b = sex + 0;

run;

StatDave
SAS Super FREQ

Yes, adding 0 would be an easy way to convert all character variables (whose values are always numbers) to numeric.  Another way uses the INPUT function as illustrated in this sample program:

40700 - How to convert all character variables to numeric and use the same variable names in the out...

PaigeMiller
Diamond | Level 26

"Assume they all had values of zero or 1."

Sounds to me like you are implying the values are not zero or 1, especially since you originally spoke of Y/N or dichotomous (a very general term) ... but conceptually lets solve the problem when everything is zero or one. Which is fine if that's what you want to do, but the solution for the case where everything is 0 or 1 is not the solution when you have general dichotomous variables.

Or maybe I still don't understand.

--
Paige Miller
art297
Opal | Level 21

I can't test this, but I think that proc reg will work with character variables that only contain numbers.  E.g., I'd suggest trying the following test:

/*create some test data*/

data have (drop=_:);

  set sashelp.class (rename=(height=_height weight=_weight age=_age));

  age=ifn(_age lt 14,'1','0');

  height=ifn(_height le 60,'1','0');

  weight=ifn(_weight lt 100,'1','0');

run;

proc reg data=have;

      eq1: model  weight=height;

      eq2: model  weight=height age;

run;

data_null__
Jade | Level 19

The IFN function does not create character variables.

art297
Opal | Level 21

DN: Good point! I was surprised when it appeared to have worked and, obviously, it didn't work.

When I created the test file correctly using the ifc function, it totally failed.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3683 views
  • 1 like
  • 6 in conversation