Help using Base SAS procedures

Randomly select records based on sum of a column

Accepted Solution Solved
Reply
Regular Contributor
Posts: 233
Accepted Solution

Randomly select records based on sum of a column

I need to create a new data set using the Have dataset. The conditons are as follow:

1. Include all observations with Column C = C

2. Select random observations with Column C = T where sum of  TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**

Dataset Have

Column AColumn BColumn CColumn DTTL_PLSTN
1234567891ACNP11
1234567891ACNP233
1234567891ACNP321
1234567891ACNP42
1234567891ACNP564
1234567891ACNP62
1234567891ACNP797
1234567891ATNP810
1234567891ATNP91
1234567891ATNP103
1234567891ATNP111
1234567891ATNP31
1234567891ATNP132
1234567891ATNP14264
1234567891ATNP151

Want Dataset:

Column AColumn BColumn CColumn DTTL_PLSTN
1234567891ACNP11
1234567891ACNP233
1234567891ACNP321
1234567891ACNP42
1234567891ACNP564
1234567891ACNP62
1234567891ACNP797
1234567891ATNP              8  10
1234567891ATNP              9   1
1234567891ATNP              3   1

Accepted Solutions
Solution
‎04-10-2012 12:02 PM
Super Contributor
Posts: 1,636

Re: Randomly select records based on sum of a column

how about:

data have ;

input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;

cards;

123456789 1A C  NP 1 1

123456789 1A C  NP 2 33

123456789 1A C  NP 3 21

123456789 1A C  NP 4 2

123456789 1A C  NP 5 64

123456789 1A C  NP 6 2

123456789 1A C  NP 7 97

123456789 1A T  NP 8 10

123456789 1A T  NP 9 1

123456789 1A T  NP 10 3

123456789 1A T  NP 11 1

123456789 1A T  NP 3 1

123456789 1A T  NP 13 2

123456789 1A T  NP 14 264

123456789 1A T  NP 15 1

;

proc sort data=have out=have1(where=(column_c='T'));

  by TTL_PLST;

run;

data have1(drop=total);

  set have1;

  retain total;

   total+TTL_PLST;

  if total <=20;

run;

data want;

   set have (where=(column_c='C'))

     have1;

run;

Linlin

View solution in original post


All Replies
Solution
‎04-10-2012 12:02 PM
Super Contributor
Posts: 1,636

Re: Randomly select records based on sum of a column

how about:

data have ;

input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;

cards;

123456789 1A C  NP 1 1

123456789 1A C  NP 2 33

123456789 1A C  NP 3 21

123456789 1A C  NP 4 2

123456789 1A C  NP 5 64

123456789 1A C  NP 6 2

123456789 1A C  NP 7 97

123456789 1A T  NP 8 10

123456789 1A T  NP 9 1

123456789 1A T  NP 10 3

123456789 1A T  NP 11 1

123456789 1A T  NP 3 1

123456789 1A T  NP 13 2

123456789 1A T  NP 14 264

123456789 1A T  NP 15 1

;

proc sort data=have out=have1(where=(column_c='T'));

  by TTL_PLST;

run;

data have1(drop=total);

  set have1;

  retain total;

   total+TTL_PLST;

  if total <=20;

run;

data want;

   set have (where=(column_c='C'))

     have1;

run;

Linlin

Regular Contributor
Posts: 233

Re: Randomly select records based on sum of a column

Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.

Super User
Posts: 10,020

Re: Randomly select records based on sum of a column

Can't understand what you mean about 'Randomly Select observation'

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 178 views
  • 0 likes
  • 3 in conversation