So I have these 101 variables and they are all continious. they are, say, v1 v2 v3 v4.....v100 yes_No.
Now I wanted to seperate values on EACH VARIABLE to 10 groups. for example, "age" for 10 groups,and "income" for 10 groups.
One observation can be in group one for V1 and group 6 for V2.
I know we can use proc rank to seperate out groups but I need help from there on.
So, how do I do this on the 100 varibales? Any macro suggestions?
What I mean is in the follwing, (Please ingonre any syntax errors, they are just to express my thoughts)
do order=1 to 100;
%let varlist=v1-v100;
%macro xx ( , order );
data data_new;
set data_old
keep &order from varlist;
run;
proc rank data=data_new; groups=10;
out=ranknew;
run;
data ranknew;
iv&order= {count(yes_no=1)/ count(yes_no=0)}
%mend xx;
and hopefully I will have the 100 new dataset all combined together.
Please help. Thank you all very much in advance.
updated: by the way extra question, how to write a program quickly detect/seperate out which variables are continuous which are discrete quickly and label them? thanks.
Are your variable names actually an enumerated list of are they something else?
Either way maybe this will get you started.
Regarding "discrete" vs. "continuous" there is no standard way to do this. You have to set up your own rules. Here are some considerations.
If a variable is character, should it always be defined as discrete?
If a variable is always an integer, should it be defined as discrete? If it takes on noninteger values, must it be defined as continuous?
If a variable takes on more than 30 different values, should it be defined as continuous? (30 is an arbitrary cutoff, up to you to decide.)
No matter what set of rules you create, there will be exceptions that apply to a particular data set. Zip codes take on thousands of values, yet should be considered discrete. ICD9 codes should probably be stored as character, but if the numeric portion is stored separately it will contain decimal fractions. Yet it is really discrete. In the end, you will probably end up storing a list of variable names that should always be treated as continous (regardless of the rules that you set up), and a second list of variable names that should always be treated as discrete (again, regardless of other rules that you set up).
In a nutshell, if you had all the data values presented to you, and an infinite amount of time to work on the problem, how would you decide?
Good luck.
somtimes when you have hundreds or over 1000 variables you may wanna do that?
No, you never wanna do that. It's painstaking, tedious work. But you have to do something. So you do have to decide on what the rules will be. You do have to recognize that the rules you set up will probably be inaccurate for some of the variables. And you do have to decide whether you want to do something about that, vs. accept some inaccurate "decisions" that arise from your rules.
If the same data set will be used in this way many times over, the work that you put in up front will be useful every time. It's up to you to decide how much work is worth the effort.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.