I need to create a new data set using the Have dataset. The conditons are as follow:
1. Include all observations with Column C = C
2. Select random observations with Column C = T where sum of TTL_PLST less than or equal to 20. ** Note: The records highlighted in Red make a sum of 20 where Column C = T.**
Dataset Have
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 10 | 3 |
123456789 | 1A | T | NP | 11 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
123456789 | 1A | T | NP | 13 | 2 |
123456789 | 1A | T | NP | 14 | 264 |
123456789 | 1A | T | NP | 15 | 1 |
Want Dataset:
Column A | Column B | Column C | Column D | TTL_PLST | N |
123456789 | 1A | C | NP | 1 | 1 |
123456789 | 1A | C | NP | 2 | 33 |
123456789 | 1A | C | NP | 3 | 21 |
123456789 | 1A | C | NP | 4 | 2 |
123456789 | 1A | C | NP | 5 | 64 |
123456789 | 1A | C | NP | 6 | 2 |
123456789 | 1A | C | NP | 7 | 97 |
123456789 | 1A | T | NP | 8 | 10 |
123456789 | 1A | T | NP | 9 | 1 |
123456789 | 1A | T | NP | 3 | 1 |
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
how about:
data have ;
input Column_A $9. Column_B $ Column_C $ Column_D $ TTL_PLST N ;
cards;
123456789 1A C NP 1 1
123456789 1A C NP 2 33
123456789 1A C NP 3 21
123456789 1A C NP 4 2
123456789 1A C NP 5 64
123456789 1A C NP 6 2
123456789 1A C NP 7 97
123456789 1A T NP 8 10
123456789 1A T NP 9 1
123456789 1A T NP 10 3
123456789 1A T NP 11 1
123456789 1A T NP 3 1
123456789 1A T NP 13 2
123456789 1A T NP 14 264
123456789 1A T NP 15 1
;
proc sort data=have out=have1(where=(column_c='T'));
by TTL_PLST;
run;
data have1(drop=total);
set have1;
retain total;
total+TTL_PLST;
if total <=20;
run;
data want;
set have (where=(column_c='C'))
have1;
run;
Linlin
Thanks for your quick reply Linlin. Looking at the original data, I totally lost what I was doing. So, I had to post it here.
Can't understand what you mean about 'Randomly Select observation'
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.