BookmarkSubscribeRSS Feed
ertr
Quartz | Level 8

Hello everyone,

 

As every model developer know, to do some validation tests, you should make bucketing in your data set and compare and do some tests between your development and performance data sets. At this point, to make this, I found a following link as below but I could not adjust that example for my sample; 

 https://communities.sas.com/t5/SAS-Procedures/Help-Using-Proc-Rank-With-Two-Datasets/td-p/124480

Based on above sample, I perfomed my sampe over SASHELP.CLASS, Let's pretend this is my development sample and I could bucketing as below;

 

proc rank data=sashelp.class out=mranks groups=3;
var age;
ranks rage;
run;

PROC TABULATE DATA=WORK.MRANKS;	
	CLASS rage /	ORDER=UNFORMATTED MISSING;
	CLASS Sex /	ORDER=UNFORMATTED MISSING;
	TABLE /* Row Dimension */rage,
/* Column Dimension */Sex*N ALL={LABEL="Total (ALL)"}* N 		;
RUN;

Then I made simple changes and prepared a performance data set as below;

 

Data PerfClass;
Length Name $ 32 Gender $ 1 Age 8;
Infile Datalines dlm="	" Missover;
Input Name Gender Age;
Datalines;
Alfred	M	15
Alice	F	14
Barbara	F	15
Carol	F	15
Henry	M	14
James	M	13
Jane	F	11
Janet	F	15
Jeffrey	M	12
John	M	13
Joyce	F	11
Judy	F	12
Louise	F	13
Mary	F	12
Philip	M	15
Robert	M	11
Ronald	M	14
Thomas	M	11
William	M	15
;

I would like to get Female and Male counts based on Train data set bucket ranges. How can I do it?

 

I am not sure whether my sample is correct or not but I hope I could made myself clear 🙂

 

@josh_wander@MikeZdeb@Linlin@PaigeMiller

Thanks

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Your link doesn't work.

 

It is also not clear to me how PROC RANK or PROC TABULATE has anything to do with separating data into train and validation data sets. Could you explain that further? Could we also stick with common terminology, "training" and "validation" rather than other terms?

--
Paige Miller
ertr
Quartz | Level 8

Hello, 

 

I fixed the link

 

I tried to tell via sample default data sets. The following data set is train data set and PROC RANK and PROC TABULATE provides me to see number of count for every group(bucket)

 

Data TRAIN;
Length Score 8 Target 8 P_Target 8;
Infile Datalines missover;
Input Score Target P_Target;
Datalines;
1200 1 0.7
1210 0 0.6
1220 1 0.8
1230 0 0.5
1240 1 0.8
1250 0 0.5
1260 1 0.8
1270 0 0.5
1280 1 0.9
1290 0 0.5
1300 0 0.4
1310 1 0.8
1320 0 0.5
1330 0 0.6
1340 1 0.8
1350 0 0.2
1360 0 0.3
1370 1 0.7
1380 0 0.4
1390 0 0.6
1400 1 0.7
1410 0 0.5
1420 0 0.4
1430 0 0.6
1440 1 0.8
1450 0 0.4
1460 0 0.6
1470 0 0.5
1480 1 0.8
1490 0 0.4
1500 0 0.6
;
Run;

proc rank data=TRAIN out=TRAIN_RANKED groups=3 ties=mean;
var SCORE;
ranks RSCORE;
run;

PROC TABULATE DATA=TRAIN_RANKED;	
	CLASS RSCORE /	ORDER=UNFORMATTED MISSING;
	CLASS TARGET /	ORDER=UNFORMATTED MISSING;
	TABLE /* Row Dimension */RSCORE,
/* Column Dimension */TARGET*N ALL={LABEL="Total (ALL)"}* N 		;
RUN;

I also have following Validation data set and I want to apply Train data set ranges on Validation data set to see number of count Validation files between Train data set ranges. How can I do it?

 

Data PERF;
Length Score 8 Target 8 ;
Infile Datalines missover;
Input Score Target;
Datalines;
1250 1
1260 1
1270 1
1280 0
1290 1
1295 0
1296 1
1297 0
1298 1
1299 0
1300 0
1350 1
1360 0
1370 1
1380 1
1390 0
1395 0
1396 0
1397 0
1398 1
1400 1
1450 0
1460 0
1470 1
1480 1
1490 0
1495 1
1496 0
1497 1
1498 0
1600 1
;
Run;

My desired output as below;

 

RankTrain0Train1TrainAllPerf0onTrainPerf1onTrainPerfAllonTrainMinTrainMaxTrain
05510461001300
17411551113001410
28210551014101500

 

@PaigeMiller@Ksharp @Linlin @MikeZdeb @josh_wander

 

Thanks

PaigeMiller
Diamond | Level 26

If I am understanding this properly, I don't think PROC RANK is the way to go at all. I think you want to calculate the 33.333 percentile and the 66.667 percentile of your training data set. This can be done via PROC UNIVARIATE, and these percentiles can be output to a SAS data set. 

 

Once the 33.333 and 66.667 percentiles are in a SAS data set, it is easy to apply them to the validation data set. This can be done via a series of IF-THEN statements in a DATA step.

--
Paige Miller
Ksharp
Super User

I think you missed option 

proc rank data=sashelp.class out=mranks groups=3 ties=low|high|dense ;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1297 views
  • 0 likes
  • 3 in conversation