BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I am building a new logistic regression model.

Part of the analysis is to calculate IV (information value) for the independent variables of the new model.

I wrote a code that calculate IV .

The problem is that Instead of see the categories formats names I see the categories values (1,2,3, and so on).

I found a solution by writing IF statements to apply formats.

However, this solution is not efficient because in real life I want to test multiple models and each time to check  IV's only for the variables of the model that I check.

My question:

Is there a way to see formatted values of categories that is done automatically ?

The target is to take the formats for varaibles X1,X2,X3,X4,X5 from source data set ttt

 

PROC FORMAT ;
VALUE IVfmt
low-<0.02='(d) Very Low'
0.02 <-  0.1   ='(c) Low'
0.1 <-  0.3   ='(b) Medium'
0.3 <-   HIGH  ='(a) High'

;
RUN;

 
proc format;
value f1ff
1='a1'
2='b1'
3='c1'
4='d1'
;
value f2ff
1='a2'
2='b2'
3='c2'
4='d2'
;
value f3ff
1='a3'
2='b3'
3='c3'
;
value f4ff
1='a4'
2='b4'
3='c4'
;
value f5ff
1='a5'
2='b5'
3='c5'
4='d5'
;
Run;


Data ttt;
Retain Id X1 X2 X3 X4 X5;
Format X1  f1ff. X2 f2ff. X3 f3ff. X4 f4ff. X5 f5ff.;
Input ID x1 x2 x3 x4 x5 Ind_Fail;
label
ID='Customer ID'
X1='Team'
X2='Indicator big/small'
X3='Light strength'
X4='Location'
X5='Country'
;
cards;
6 4 4 3 2 4 1
1 2 1 3 1 2 0
4 4 3 3 2 1 0
2 3 2 2 1 2 0
3 1 2 1 2 3 1
5 2 3 2 3 2 0
7 3 2 1 2 1 1
;
Run;

%let V=x1 x2 x3 x4 x5;

proc sort data=ttt;
by ID;
run;

proc transpose data=ttt out=Long_Raw_
(rename=(_name_=field _label_=Fieldlabel col1=category));
by ID;
var &V.;
run;


data Long_Raw;
set Long_Raw_;
category=left(category);
run;


PROC SQL;
	create table LONG_RAW_b  as
	select  a.*,b.Ind_Fail 
   from  LONG_RAW as a
	left join ttt as b
	on a.ID=b.ID
;
QUIT;
 
 
PROC SQL;
	create table Summary_IV1   as
	select  field,Fieldlabel, category,
			count(*) as nr_laks,
			sum(Ind_Fail ) as nr_laks_bad,
			sum(calculated nr_laks,-calculated nr_laks_bad) as nr_laks_good,
			calculated nr_laks_bad/calculated nr_laks as pct_laks_bad format=percent8.2 
	from  LONG_RAW_b
	group by field,Fieldlabel, category	
	order by field,Fieldlabel,category
;
QUIT;


 PROC SQL;
	create table Summary_IV2  as
	select field,
 			sum(Ind_Fail ) as Total_laks_bad,
 			sum(case when Ind_Fail =0 then 1 else 0 end) as Total_laks_good
	from  LONG_RAW_b
	group by field
;
QUIT;


 PROC SQL;
	create table Summary_IV3  as
	select a.*,
	       a.nr_laks_bad/b.Total_laks_bad as Tamhil_bad,
	       a.nr_laks_good/b.Total_laks_good as Tamhil_good,
            log(calculated Tamhil_good/calculated Tamhil_bad) as WOE  ,
			calculated WOE*(calculated Tamhil_good-calculated Tamhil_bad) as IV
	from  Summary_IV1 as a 
	left join Summary_IV2 as b
	on a.field=b.field
;
QUIT;


PROC SQL;
	create table Summary_IV4  as
	select  field,' TOTAL' as Fieldlabel,
	          sum(nr_laks) as nr_laks,
			  sum(nr_laks_bad) as nr_laks_bad,
			  sum(nr_laks_good) as nr_laks_good,
			  calculated  nr_laks_bad/calculated  nr_laks as pct_laks_bad format=percent8.2,
			  sum(Tamhil_bad) as Tamhil_bad,
			  sum(Tamhil_good) as Tamhil_good,
              sum(IV) as IV,
              put(calculated IV,IVfmt.) as  IV_desc
	from  Summary_IV3
	group by field
;
QUIT;
proc sort data=Summary_IV3;by field;run;
proc sort data=Summary_IV4;by field;run;

Data Final_IV;
Retain category category_new;
length category_new $50.;
set Summary_IV3  Summary_IV4;
by field;
format nr_laks nr_laks_bad comma21. Tamhil_bad Tamhil_good WOE  IV comma8.2; 
/**My question is about the following code to apply format to category var**/
/**IS there a way to apply formats automatically from the source data set ttt**/
IF field='X1' then category_new=put(category,f1ff.);
else IF field='X2' then category_new=put(category,f2ff.);
else IF field='X3' then category_new=put(category,f3ff.);
else IF field='X4' then category_new=put(category,f4ff.);
else IF field='X5' then category_new=put(category,f5ff.);
Run;

proc sort data=Final_IV;
by field category;
Run;
title  'Information value for varaibles in this model';
proc print data=Final_IV noobs;
run;

 

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26

This is an awful lot of code, and then you don't specify exactly where in the code you are having problems.

 

I also don't really grasp the meaning of your statements, could you please give an example of what you are seeing and the desired solution for  these 3 statements:

 

  1. The problem is that Instead of see the categories formats names I see the categories values (1,2,3, and so on).
  2. this solution is not efficient because in real life I want to test multiple models and each time to check  IV's only for the variables of the model that I check.
  3. Is there a way to see formatted values of categories that is done automatically ?
--
Paige Miller
Ksharp
Super User
select  field  format=f1ff.  ,Fieldlabel, category,

You could add a format after a variable.

Ronein
Meteorite | Level 14

Yes, but I run a dynamic code.....Each time I run a different list of variables and for these varaibles I want to calculate IV.

In the methos that you show I need to manually type the format for each varaible....It is not efficient for me.

select  field  format=f1ff.  ,Fieldlabel, category,

 I will make my question more specific:

In proc transpose code I loose the format of the categories values.(Because I get one column....)

What is the way to trasfer the values into formatted values in column Category?

Then in the Inforation value output I can see the meaning of each value of any category

proc transpose data=ttt out=Long_Raw_
(rename=(_name_=field _label_=Fieldlabel col1=category));
by ID;
var &V.;
run;

 

Tom
Super User Tom
Super User

Formats apply to the whole variable.  So if you are transposing the values so that the resulting numeric column has values from different original variables that used different formats you will need to first store the formatted value into a character variable.  

select
  field as raw_value
, put(field,f1ff.) as formatted_value  
, Fieldlabel
, category
, ...
Ronein
Meteorite | Level 14
The raw data is external file with 500 varaibles (numeric) and each varaible has a format.
You suggest me to tranform each numeric value into its formatted value (char).
It is absolutley good solution but how can I do it for 500 varaibles in a clever way???
This is the question
Tom
Super User Tom
Super User

@Ronein wrote:
The raw data is external file with 500 varaibles (numeric) and each varaible has a format.
You suggest me to tranform each numeric value into its formatted value (char).
It is absolutley good solution but how can I do it for 500 varaibles in a clever way???
This is the question

That is a a very simple thing.

data want ;
  set have ;
  array var var1-var500 ;
  length varname $32 value 8 decode $200 ;
  do index=1 to dim(var);
    varname=vname(var[index]);
    value=var[index];
    decode=vvaluex(varname);
    output;
  end;
  keep id varname value decode ;
run;

Just replace VAR1-VAR500 in the ARRAY statement with your list of variable names.  Replace ID in the KEEP statement with the list of variables that uniquely identify an observation in the original HAVE dataset.

ballardw
Super User

Any where you expect a formatted value to appear you need to make sure that the variable with the value is associated with the format.

 

Since the only place I see a format associated with variables is in data TTT only procedures that directly use TTT can see the formats. Most of the work you are doing is with a transposed data set, or results of using that transposed data. The Transposed data does not have the formats because the VAR variables are now combined in columns and the original formats can't be applied.

 

If you expect one assignment, such as in TTT to maintain the formats then you need to watch which variables you keep where and use so that the formatted values are available.

Ronein
Meteorite | Level 14
Please show code, it is much better to see and understand what you say in words

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 974 views
  • 0 likes
  • 5 in conversation