I need to create a new data set using the Have dataset. The conditons are as follow:
1. Include all observations with Column C = C
2. Select random observations with Column C = T where sum of TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**
Dataset Have
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 10 | 3 |
123456789 | 1A | T | NP | 11 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
123456789 | 1A | T | NP | 13 | 2 |
123456789 | 1A | T | NP | 14 | 264 |
123456789 | 1A | T | NP | 15 | 1 |
Want Dataset:
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.
Can't understand what you mean about 'Randomly Select observation'
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.