I need to create a new data set using the Have dataset. The conditons are as follow:
1. Include all observations with Column C = C
2. Select random observations with Column C = T where sum of TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**
Dataset Have
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 10 | 3 |
123456789 | 1A | T | NP | 11 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
123456789 | 1A | T | NP | 13 | 2 |
123456789 | 1A | T | NP | 14 | 264 |
123456789 | 1A | T | NP | 15 | 1 |
Want Dataset:
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.
Can't understand what you mean about 'Randomly Select observation'
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: