I need to create a new data set using the Have dataset. The conditons are as follow:
1. Include all observations with Column C = C
2. Select random observations with Column C = T where sum of TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**
Dataset Have
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 10 | 3 |
123456789 | 1A | T | NP | 11 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
123456789 | 1A | T | NP | 13 | 2 |
123456789 | 1A | T | NP | 14 | 264 |
123456789 | 1A | T | NP | 15 | 1 |
Want Dataset:
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.
Can't understand what you mean about 'Randomly Select observation'
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.