02-22-2013 06:46 AM
I was wondering if there is a procedure in SAS Base in order to identify in a dataset the type of each variable and i dont mean if it is numeric or character, i am looking for a deeper approach like if it is nominal ,ordinal or continuous. Any ideas?
Thnx in advance
02-22-2013 07:53 AM
As per my knowledge in SAS, i do not know whether there is any Proc available or not for this...But the question you asked is about measurement scale, Nominal,Ordinal,etc...these are not the variable type...So in SAS by default, SAS treates the variable either as charcter or as numeric...
02-22-2013 08:59 AM
Enterprise Miner takes a guess by looking at the data
"By default, it takes a random sample of 2,000 observations from the
data set of interest, and uses this information to assign a model role and a
measurement level to each variable."
That approach relies on having lots of data and is not generalizable to parts of SAS that also must work with small samples. It can also be wrong in EM, particularly in determining ordinal scaling, so you still need to know your data.
02-22-2013 09:02 AM
If you have BASE, then PROC UNIVARIATE is your best bet, and I suppose it can fit most of your needs. While you have deeper pocket that you have SAS/QC, /ETS or /INSIGHT, then you can also look into: PROC CAPABILITY, PROC SEVERITY and PROC RELIABILITY.
It is not like Char or Num type of information that you can obtain from Metadata, you will have to do an analysis towards the variables of your interest.
02-22-2013 09:06 AM
All SAS procs and functions, that I'm aware of, simply distinguish between character and numeric. And, even with EM, the automatic assignments are only based on number of values not what the data actually represent.
02-22-2013 10:28 AM
There is no automatic way. It takes defining your criteria and checking them. Even then, the best you can do (usually) is categorical vs. continuous (integer) vs. continuous (noninteger). Sometimes you might want to distinguish binary from the other possibilities as well, but you may have to check a larger sample to be confident of a variable being binary.
If a variable takes on 20 different values, must it be continuous (not categorical)? What is the limit?
If a variable takes on noninteger values, must it be continuous?
Any rules you come up with will always have exceptions. For example, you may get a set of integers that represent percentiles, and have 100 possible values. It would be good practice to keep lists of variables that you know about: variables that are always categorical (no matter how many values they take on), and variables that are always continuous (no matter how few values they take on).
The checking is usually done on a sample of observations, but I have typically used thousands rather than hundreds of observations.