I have a question about coding values of a variable. I have split this into two parts as it would be helpful to me even if I could only get an answer to the first part. Thanks ahead of time for any light anyone is able to shed on this!
/** DATA SAMPLE **/
I have hundreds of string variables with over a thousand observations for each.
CC freq = 494 GG freq = 3000
CT freq = 29 GT freq = 185
TT freq = 1 TT freq = 39
/** PART A **/
I need to convert the strings to numerals while simultaneously setting the largest (most frequent) category to the largest value. Ultimately, this is so I can use catmod for polytomous regression and get the correct reference categories. (These are all independent variables, I have been able to set the correct reference category for the dependent variable.)
Is there a function that will do this for me?
/** PART B **/
and how can I integrate that into a macro?
Here is the macro I am using at the moment:
%let varnum = 1;
%let var = %scan(&varlist, &varnum);
%do %while (%length(&var) ne 0);
ods listing close;
proc catmod data=cox2 ;
direct X ;
model dep_var = X &var ;
ods output Estimates = model_&var ;
/** I am then using PROC REPORT to recover outputs **/
/** I am interested in. These lines have been **/
/** omitted from this example **/
Apart from using the LOGISTIC procedure instead of CATMOD, which may lead you to find the ORDER=FREQ option useful, I suggest that kind of trick :
1) PROC FREQ ORDER=FREQ your data according to the chosen variable (the one to recode), then save results with ODS OUTPUT oneWayFreqs = work.myValues ;
2) Sort MyValues By DESCENDING variable, then with a DATA step, re-read the MyValues dataset to add a new variable, recoded=_N_, to it ;
3) Merge MyValues with your core data BY the chosen variable
4) use recoded as input for your modelling procedure.
This can be performed for each variable in your macro, even if it may be time-consuming to re-FREQ the whole dataset, then re-merge it, at each loop.