<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Bucketing between Train and Performance Data Set in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495644#M130855</link>
    <description>&lt;P&gt;Your link doesn't work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is also not clear to me how PROC RANK or PROC TABULATE has anything to do with separating data into train and validation data sets. Could you explain that further? Could we also stick with common terminology, "training" and "validation" rather than other terms?&lt;/P&gt;</description>
    <pubDate>Fri, 14 Sep 2018 12:15:58 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2018-09-14T12:15:58Z</dc:date>
    <item>
      <title>Bucketing between Train and Performance Data Set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495633#M130847</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As every model developer know, to do some validation tests, you should make bucketing in your data set and compare and do some tests between your development and performance data sets. At this point, to make this, I found a following link as below but I could not adjust that example for my sample;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Procedures/Help-Using-Proc-Rank-With-Two-Datasets/td-p/124480" target="_blank"&gt;https://communities.sas.com/t5/SAS-Procedures/Help-Using-Proc-Rank-With-Two-Datasets/td-p/124480&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Based on above sample, I perfomed my sampe over SASHELP.CLASS, Let's pretend this is my development sample and I could bucketing as below;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc rank data=sashelp.class out=mranks groups=3;
var age;
ranks rage;
run;

PROC TABULATE DATA=WORK.MRANKS;	
	CLASS rage /	ORDER=UNFORMATTED MISSING;
	CLASS Sex /	ORDER=UNFORMATTED MISSING;
	TABLE /* Row Dimension */rage,
/* Column Dimension */Sex*N ALL={LABEL="Total (ALL)"}* N 		;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Then I made simple changes and prepared a performance data set as below;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data PerfClass;
Length Name $ 32 Gender $ 1 Age 8;
Infile Datalines dlm="	" Missover;
Input Name Gender Age;
Datalines;
Alfred	M	15
Alice	F	14
Barbara	F	15
Carol	F	15
Henry	M	14
James	M	13
Jane	F	11
Janet	F	15
Jeffrey	M	12
John	M	13
Joyce	F	11
Judy	F	12
Louise	F	13
Mary	F	12
Philip	M	15
Robert	M	11
Ronald	M	14
Thomas	M	11
William	M	15
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I would like to get Female and Male counts based on Train data set bucket ranges. How can I do it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not sure whether my sample is correct or not but I hope I could made myself clear &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/33189"&gt;@josh_wander&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13759"&gt;@MikeZdeb&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/1762"&gt;@Linlin&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 14 Sep 2018 12:21:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495633#M130847</guid>
      <dc:creator>ertr</dc:creator>
      <dc:date>2018-09-14T12:21:13Z</dc:date>
    </item>
    <item>
      <title>Re: Bucketing between Train and Performance Data Set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495644#M130855</link>
      <description>&lt;P&gt;Your link doesn't work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is also not clear to me how PROC RANK or PROC TABULATE has anything to do with separating data into train and validation data sets. Could you explain that further? Could we also stick with common terminology, "training" and "validation" rather than other terms?&lt;/P&gt;</description>
      <pubDate>Fri, 14 Sep 2018 12:15:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495644#M130855</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2018-09-14T12:15:58Z</dc:date>
    </item>
    <item>
      <title>Re: Bucketing between Train and Performance Data Set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495663#M130859</link>
      <description>&lt;P&gt;I think you missed option&amp;nbsp;&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token procnames"&gt;proc&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;rank&lt;/SPAN&gt; &lt;SPAN class="token procnames"&gt;data&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;sashelp&lt;SPAN class="token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token statement"&gt;class&lt;/SPAN&gt; out&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;mranks groups&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;3 &lt;STRONG&gt;ties=low&lt;/STRONG&gt;|high|dense &lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 14 Sep 2018 13:03:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495663#M130859</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-09-14T13:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Bucketing between Train and Performance Data Set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495802#M130943</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I&amp;nbsp;fixed the link&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to tell via sample default data sets. The following data set is train data set and PROC RANK and PROC TABULATE provides me to see number of count for every group(bucket)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data TRAIN;
Length Score 8 Target 8 P_Target 8;
Infile Datalines missover;
Input Score Target P_Target;
Datalines;
1200 1 0.7
1210 0 0.6
1220 1 0.8
1230 0 0.5
1240 1 0.8
1250 0 0.5
1260 1 0.8
1270 0 0.5
1280 1 0.9
1290 0 0.5
1300 0 0.4
1310 1 0.8
1320 0 0.5
1330 0 0.6
1340 1 0.8
1350 0 0.2
1360 0 0.3
1370 1 0.7
1380 0 0.4
1390 0 0.6
1400 1 0.7
1410 0 0.5
1420 0 0.4
1430 0 0.6
1440 1 0.8
1450 0 0.4
1460 0 0.6
1470 0 0.5
1480 1 0.8
1490 0 0.4
1500 0 0.6
;
Run;

proc rank data=TRAIN out=TRAIN_RANKED groups=3 ties=mean;
var SCORE;
ranks RSCORE;
run;

PROC TABULATE DATA=TRAIN_RANKED;	
	CLASS RSCORE /	ORDER=UNFORMATTED MISSING;
	CLASS TARGET /	ORDER=UNFORMATTED MISSING;
	TABLE /* Row Dimension */RSCORE,
/* Column Dimension */TARGET*N ALL={LABEL="Total (ALL)"}* N 		;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I also have following Validation data set and I want to apply Train data set ranges on Validation data set to see number of count Validation files between Train data set ranges. How can I do it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data PERF;
Length Score 8 Target 8 ;
Infile Datalines missover;
Input Score Target;
Datalines;
1250 1
1260 1
1270 1
1280 0
1290 1
1295 0
1296 1
1297 0
1298 1
1299 0
1300 0
1350 1
1360 0
1370 1
1380 1
1390 0
1395 0
1396 0
1397 0
1398 1
1400 1
1450 0
1460 0
1470 1
1480 1
1490 0
1495 1
1496 0
1497 1
1498 0
1600 1
;
Run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;My desired output as below;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Rank&lt;/TD&gt;&lt;TD&gt;Train0&lt;/TD&gt;&lt;TD&gt;Train1&lt;/TD&gt;&lt;TD&gt;TrainAll&lt;/TD&gt;&lt;TD&gt;Perf0onTrain&lt;/TD&gt;&lt;TD&gt;Perf1onTrain&lt;/TD&gt;&lt;TD&gt;PerfAllonTrain&lt;/TD&gt;&lt;TD&gt;MinTrain&lt;/TD&gt;&lt;TD&gt;MaxTrain&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;1300&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;11&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;11&lt;/TD&gt;&lt;TD&gt;1300&lt;/TD&gt;&lt;TD&gt;1410&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;1410&lt;/TD&gt;&lt;TD&gt;1500&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/1762"&gt;@Linlin&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13759"&gt;@MikeZdeb&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/33189"&gt;@josh_wander&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 14 Sep 2018 18:39:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495802#M130943</guid>
      <dc:creator>ertr</dc:creator>
      <dc:date>2018-09-14T18:39:41Z</dc:date>
    </item>
    <item>
      <title>Re: Bucketing between Train and Performance Data Set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495813#M130947</link>
      <description>&lt;P&gt;If I am understanding this properly, I don't think PROC RANK is the way to go at all. I think you want to calculate the 33.333 percentile and the 66.667 percentile of your training data set. This can be done via PROC UNIVARIATE, and these percentiles can be output to a SAS data set.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once the 33.333 and 66.667 percentiles are in a SAS data set, it is easy to apply them to the validation data set. This can be done via a series of IF-THEN statements in a DATA step.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Sep 2018 19:42:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Bucketing-between-Train-and-Performance-Data-Set/m-p/495813#M130947</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2018-09-14T19:42:41Z</dc:date>
    </item>
  </channel>
</rss>

