About EC189QRW

EC189QRW · ‎03-08-2021

I love this paper. Thanks！

EC189QRW · ‎03-08-2021

IV which is designed for screening variables in modeling. I am trying to bulild up a Credit Score Card to predict if a customer default or not in future. The variables come from customers' transaction and their behaviour. if you feel interested, try Naeem Siddiqi's book"Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring ", it might help.

EC189QRW · ‎03-06-2021

Yes! the key is unique in customer level and only one target variable "Good/Bad". As i've got a really 'wide' widetable which contains almost like 20 thousands features in a dataset. I'd like to caculate the information value (IV) for each feature before Modelling. The size of dataset is enormous and computation resouces is very limited. Set the widetable for each feature each time is too expensive. So i tried to create multiple dataset in one data step. Accessing the small dataset and do the calculation could save me quite a lot time. Thank you so much for your help, i really appreciate it.

EC189QRW · ‎03-06-2021

hi all, i'd love to create multiple datasets from one dataset (sashelp.class) and name each dataset with the target variable(weight height sex age) and the key variable(name). The hard code looks like as follows. Thanks. data weight(keep=name weight) height(keep=name height) sex(keep=name sex) age(keep=name age); set sashelp.class; run;

EC189QRW · ‎07-06-2020

Here is the log . I think there might be some problem with my SAS base. NOTE: Additional host information: X64_10PRO WIN 10.0.18362 Workstation NOTE: SAS initialization used: real time 2.57 seconds cpu time 1.99 seconds 1 data class; 2 set sashelp.class; 3 4 if height<=55 then 5 h='01 L'; 6 else if height<=60 then 7 h='02 M'; 8 else 9 h='03 H'; 10 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 19 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.03 seconds 11 proc tabulate data=class out=temp; NOTE: Writing HTML Body file: sashtml.htm 12 class age sex h; 13 table age='', h=''*N=''; 14 run; NOTE: There were 19 observations read from the data set WORK.CLASS. NOTE: The data set WORK.TEMP has 9 observations and 6 variables. NOTE: PROCEDURE TABULATE used (Total process time): real time 0.64 seconds cpu time 0.43 seconds 15 proc transpose data=temp out=want (drop=_:); 16 var N; 17 id h; 18 by age; 19 run; NOTE: There were 9 observations read from the data set WORK.TEMP. NOTE: The data set WORK.WANT has 6 observations and 1 variables. NOTE: PROCEDURE TRANSPOSE used (Total process time): real time 0.04 seconds cpu time 0.03 seconds 20 proc print data=want; 21 NOTE: There were 6 observations read from the data set WORK.WANT. NOTE: PROCEDURE PRINT used (Total process time): real time 12.30 seconds cpu time 0.61 seconds

EC189QRW · ‎07-06-2020

THANKS A LOT!!

EC189QRW · ‎07-06-2020

hi there, thank you for your quick reply. I found the transpose procedure doesn't work when the categorical variable have more than 2 different types. data class; set sashelp.class; if height<=55 then h='01 L'; else if height<=60 then h='02 M'; else h='03 H'; run; proc tabulate data=class out=temp; class age sex h; table age='',h=''*N=''; run; proc transpose data=temp out=want (drop=_:); var N; by age; id h; run; proc print data=want;run;

EC189QRW · ‎07-06-2020

Dear All, I've already created a Temp dataset from proc tabulate procedure show as follows, Can i use any Proc procedures to return the Temp datasets back into the original table created by Proc tabulates? Any advice to do it? Thank you all. Eric step1: proc tabulate data=sashelp.class out=temp; class age sex; table age='',sex=''*N=''; run; temp datasets looks like this step2: proc XXX data=temp;run; I am looking for the results like this

EC189QRW · ‎09-27-2018

Thank you for your reply. Sorry for my misleading. I forgot some important information. The sample dataset is from credit card transactions. Some of those might be fraudulent, some of those might be normal. Basically, we implemented these rules to label highly suspicious transactions. GB =1 means an actual fraud transaction, GB=0 means a non-fraud transaction. R1 to R5 stands for different rules we used to label suspicious transactions. For instance, R1=1 means the transaction labeled as a fraudulent transaction. R1=0 means the transaction labeled as a real transaction. So confusion matrix could be used here to select effective rules. Our major concerns for these rules is TPR (Ture positive rate) and PV+ (Positive predicted value) ,TPR=true positive/total actual positive=d/c+d ,PV+=true positive/ total predicted positive=d/b+d. As our pool of rules is almost full so I’d like to select a sequence of effective rules out of pools and implemented in a system which might give a relief to our server. 　 Predicted:1 Predictied:0 　 actual:1 d, True Positive c, False Negative c+d, Actual Positive actual:0 b, False Positive a, True Negative a+b, Actual Negative 　 b+d, Predicted Positive a+c, Predicted Negative 　 I’d like to get a rule list like r2,r1,r4,r3 as follows. Obs rule ruleselected accuracy errorate Tpr Pvplus Tnr PvMinus 1 RuleX r2,r1,r4,r3 0.55 0.45 1 0.45455 0.28 1 The first round selection of rules is R2 because of its highest TPR in the rule list. Then R2 becomes part of rules of pool. The second round I need to calculate R2R1, R2R3,R2R4,R2R5 and try to select the highest TPR out of second round rule list and add the second rule to the rules of pool, for example R1. The process continue until there would be no increase in TPR for the pools. Then the iteration stops. I don’t know if I made point clear. If you have any questions, please leave a comment. Thank you for your time and really appreciate.

EC189QRW · ‎09-26-2018

I’ve got 300 rules on my rule list. Plenty of those haven’t updated for a really long time. I’d like to select some of rules based on the index given by confusion matrix, eg. TPR (Ture positive rate), PV+ (Positive predicted value) from transaction table. Maybe 60 out 300 did 99% of job based on TPR. Then the 60 rules would become my new Pool of Rules. I will drop the rest 240 rules and refresh the rule list thereafter. My basic logic is which rule gives the highest TPR from transaction would come into the Pools at first. For example , R2 gives the highest TPR. Then it comes in at first. Then the rest of rules who comes to the pool gives the highest TPR with R2 would become part of Pools. Because of there might be some overlap between each rules. So we need to calculate the TPR at each time. Make the best choice each round. The iteration would go on until the difference between TPR of Pool A and TPR of Pool B is like 0.01, I mean it would diverged at some point. At present, I could create a table of TPR and PV+ for each rules from transaction table. But I don’t know how to dynamically create a sequence of rules list and abstract some of those which gives the most TPR increase out as variables. Hope there is someone who can help me and give me some clue how to tackle the problems. Thanks at first. Here is transaction sample data. Seq stands for sequence,gb stands for GOOD/BAD ,r1-r5 stands for Rule1-Rule5. data trx; input seq gb r1 r2 r3 r4 r5; cards; 1 1 0 0 0 1 0 2 0 0 0 1 0 1 3 1 1 0 0 0 1 4 0 0 0 0 0 1 5 0 1 0 0 1 0 6 0 0 1 0 0 0 7 1 0 1 0 0 1 8 0 0 0 0 0 0 9 0 0 0 1 0 0 10 1 1 0 0 0 0 11 0 0 0 0 1 0 12 1 1 1 1 1 0 13 1 0 1 1 0 0 14 0 1 0 0 1 1 15 1 1 0 0 0 1 16 1 0 0 0 1 0 17 0 0 0 0 0 1 18 1 0 1 0 1 0 19 0 0 0 0 0 1 20 0 0 1 1 1 0 21 0 1 1 0 1 1 22 0 0 1 0 0 0 23 1 1 0 1 1 1 24 0 0 1 0 0 0 25 1 0 1 1 0 1 26 0 0 0 0 1 0 27 0 0 0 1 1 0 28 0 0 0 0 0 1 29 0 0 0 0 0 0 30 1 0 1 1 1 1 31 0 1 0 0 0 0 32 1 0 1 0 1 1 33 0 1 0 0 0 0 34 1 0 0 1 0 1 35 0 1 0 0 1 0 36 0 0 0 0 1 0 37 0 0 0 0 0 1 38 0 1 0 0 1 0 39 1 1 1 1 0 0 40 0 1 1 0 1 0 ;

EC189QRW · ‎09-13-2018

love the way to solve problem in hash. Thanks

EC189QRW · ‎01-11-2018

Dear Doug, You did help me a lot.Thank you so much!!

EC189QRW · ‎01-08-2018

Dear Doug, Thank you for your clarification about training validation and testing data. Let me put my question this way, In SAS Em terminology, I was trying to create a training and testing data set from different regions in a country, however the sampling data from each regions all have different time windows, there might a few overlaps. I just combined them in two data steps, training and testing. As you mentioned "In SAS Enterprise Miner, the training and validation data sets are intended to represent the same population (in this case, the same time period" Can i consider the combined data set from different region and different time window as a full picture of the population? Can i use them as training and testing data sets in Model building process. Thank you!

EC189QRW · ‎12-12-2017

HI there, I am creating a decision tree in SAS EM. Consumer credit data sets-training,testing and validation are all from my different colleagues who used to made logistic regression models in the past 2 years. Unfortunately i just found out that the time windows are all different from each other. For instance, Zone1's training and testing datasets are from 01Jan2015 to 30Jun2016. Zone1's Validation datasets are from 01Mar2015 to 31Aug2016. Zone2's training and testing datasets are from 01Jul2015 to 31Dec2016. Zone2's Validation datasets are from 01Dec2015 to 31May2017. Zone3's training and testing datasets are from 01Jan2016 to 30Jun2017. Zone3's Validation datasets are from 01May2016 to 31Oct2017. .... More than 15 different regions .... As a rookie in sampling, I don't know what should i do. Can i combine all those datasets and considered it as a full picture of business in all regions ? is it rational? Any help will be appreciated. Thanks a lot! Eric

EC189QRW · ‎11-28-2017

It took me more than 10 minutes to figure it out. I was questioning about myself. Thank you for your reply.

Online Status	Offline
Date Last Visited	‎02-01-2023 11:58 PM

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

SAS Macro To create multiple dataset from one dataset

Re: How can i return the output datasets from Proc Tabulate procedure ...

Re: How can i return the output datasets from Proc Tabulate procedure ...

Re: How can i return the output datasets from Proc Tabulate procedure ...

How can i return the output datasets from Proc Tabulate procedure back...

Re: Questions about rules selection Method in SAS Base

Questions about rules selection Method in SAS Base

Re: Dow Loop-Basics

Re: character to strange date (yyyy-mm-dd-hh-mm-ss -ms ?)

Re: how to compute rolling standard deviation ?

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

Re: SAS Macro To create multiple dataset from one dataset

SAS Macro To create multiple dataset from one dataset

Re: How can i return the output datasets from Proc Tabulate procedure ...

Re: How can i return the output datasets from Proc Tabulate procedure ...

Re: How can i return the output datasets from Proc Tabulate procedure ...

How can i return the output datasets from Proc Tabulate procedure back...

Re: Questions about rules selection Method in SAS Base

Questions about rules selection Method in SAS Base

Re: Levenshtein Distance

Re: Different Time window in Training,Testing and Validation datasets ...

Re: Different Time window in Training,Testing and Validation datasets ...

Different Time window in Training,Testing and Validation datasets - SA...

Re: Factorial variables in a do loop? Need a Help!LOL