I have 106 target levels here, this is just small example. SAS gives me a predicted probability of all target levels and classifies them into one of the target levels.
What I want is to create a new variable px that writes probability of only that target which sas classified into. for example for id 101 with target level of b, sas classified it into c. So I want px to have a probability of c, which is P_Targetc.
This is my code -
proc sql;
ALTER TABLE data1
ADD p varchar;
run;
proc sql;
UPDATE data1
SET p= 'P_Target';
data data2 ;
set data1 ;
px2 = cats (p,Target);
run;
What i get after this is -
px =
It will help to provide some actual data not a picture.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
I know of at least one way to do this in a data step involving arrays which means that it doesn't translate easily into Proc Sql code and would likely be extremely verbose.
A short example:
data have; input id var_a var_b var_c index :$1.; datalines; 1 .04 .6 .33 a 2 .2 .3 .8 c 3 .1 .2 .3 b ; data want; set have; array v var_a var_b var_c; array c{3} $ 1 _temporary_ ('a','b','c'); px = v[whichc(index, of c(*))]; run;
Key elements the order of the variables in the array v is associated with the order of the index (target level) values and there better be a matching number. If you have 15 variables but only 5 levels, or 5 variables and 15 levels you need to tell us now as their will likely be lots of additional issues.
The case of the values in the temporary list has to match the values of your target level.
The WHICHC function searches for the character value of the first parameter in a list following and returns the position found. With 3 variables it would be easy to just list the variables but if you have dozens that gets pretty long an the array becomes a better choice. Proc SQL does not allow use of variables. Using the returned position selects the value from the array with the correct index if everything is in order.
If the target level is ever missing you would want to test for that an not do the assignment.
Note the variables in the temporary array are not written to the data set.
Hi @harsh0404,
The VVALUEX function might come closest to your attempt using CATS. An approach using arrays would avoid numeric-to-character and character-to-numeric conversion, but you would need to determine the (numeric) array index from the I_TARGET value. Please see an example of both approaches below:
data have;
input id p_targeta p_targetb p_targetc i_target $;
cards;
999 0.1 0.2 0.3 b
;
/* Approach 1: VVALUEX function */
data want;
set have;
px=input(vvaluex(cats('p_target'||i_target)),32.);
run;
/* Approach 2: Array */
data want;
set have;
array pr[*] p_target:;
px=pr[rank(i_target)-96];
run;
We would need to know the real variable and target names to give you more specific advice.
Edit: Your sample data and description suggest that Px might always (?) be the maximum of P_Targeta, P_Targetb, etc., in which case px=max(of p_target:) would, of course, be the easiest solution.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.