## Randomly select records based on sum of a column

Solved
Regular Contributor
Posts: 233

# Randomly select records based on sum of a column

I need to create a new data set using the Have dataset. The conditons are as follow:

1. Include all observations with Column C = C

2. Select random observations with Column C = T where sum of  TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**

Dataset Have

 Column A Column B Column C Column D TTL_PLST N 123456789 1A C NP 1 1 123456789 1A C NP 2 33 123456789 1A C NP 3 21 123456789 1A C NP 4 2 123456789 1A C NP 5 64 123456789 1A C NP 6 2 123456789 1A C NP 7 97 123456789 1A T NP 8 10 123456789 1A T NP 9 1 123456789 1A T NP 10 3 123456789 1A T NP 11 1 123456789 1A T NP 3 1 123456789 1A T NP 13 2 123456789 1A T NP 14 264 123456789 1A T NP 15 1

Want Dataset:

 Column A Column B Column C Column D TTL_PLST N 123456789 1A C NP 1 1 123456789 1A C NP 2 33 123456789 1A C NP 3 21 123456789 1A C NP 4 2 123456789 1A C NP 5 64 123456789 1A C NP 6 2 123456789 1A C NP 7 97 123456789 1A T NP 8 10 123456789 1A T NP 9 1 123456789 1A T NP 3 1

Accepted Solutions
Solution
‎04-10-2012 12:02 PM
Super Contributor
Posts: 1,636

## Re: Randomly select records based on sum of a column

data have ;

input Column_A \$9. Column_B \$ Column_C \$ Column_D \$ TTL_PLST N ;

cards;

123456789 1A C  NP 1 1

123456789 1A C  NP 2 33

123456789 1A C  NP 3 21

123456789 1A C  NP 4 2

123456789 1A C  NP 5 64

123456789 1A C  NP 6 2

123456789 1A C  NP 7 97

123456789 1A T  NP 8 10

123456789 1A T  NP 9 1

123456789 1A T  NP 10 3

123456789 1A T  NP 11 1

123456789 1A T  NP 3 1

123456789 1A T  NP 13 2

123456789 1A T  NP 14 264

123456789 1A T  NP 15 1

;

proc sort data=have out=have1(where=(column_c='T'));

by TTL_PLST;

run;

data have1(drop=total);

set have1;

retain total;

total+TTL_PLST;

if total <=20;

run;

data want;

set have (where=(column_c='C'))

have1;

run;

Linlin

All Replies
Solution
‎04-10-2012 12:02 PM
Super Contributor
Posts: 1,636

## Re: Randomly select records based on sum of a column

data have ;

input Column_A \$9. Column_B \$ Column_C \$ Column_D \$ TTL_PLST N ;

cards;

123456789 1A C  NP 1 1

123456789 1A C  NP 2 33

123456789 1A C  NP 3 21

123456789 1A C  NP 4 2

123456789 1A C  NP 5 64

123456789 1A C  NP 6 2

123456789 1A C  NP 7 97

123456789 1A T  NP 8 10

123456789 1A T  NP 9 1

123456789 1A T  NP 10 3

123456789 1A T  NP 11 1

123456789 1A T  NP 3 1

123456789 1A T  NP 13 2

123456789 1A T  NP 14 264

123456789 1A T  NP 15 1

;

proc sort data=have out=have1(where=(column_c='T'));

by TTL_PLST;

run;

data have1(drop=total);

set have1;

retain total;

total+TTL_PLST;

if total <=20;

run;

data want;

set have (where=(column_c='C'))

have1;

run;

Linlin

Regular Contributor
Posts: 233

## Re: Randomly select records based on sum of a column

Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.

Super User
Posts: 10,784

## Re: Randomly select records based on sum of a column

Can't understand what you mean about 'Randomly Select observation'

🔒 This topic is solved and locked.