I have dataset with same account number assigned to different family members. There can be one row or many rows per account number. I need to keep up to the first 3 row by account number. If there's only 1 row by account number, I need to keep it. If there 7 rows by account number, then I need to keep only first 3 rows. Here is example of the data and example of what I need. Thank you in advance for the help.
ACCOUNT_NUMBER | FIRST_NM | LAST_NM | |
1234 | JOHN | JONES | |
1234 | SUE | JONES | |
1234 | MIKE | JONES | |
1234 | DAVID | JONES | |
DATA | 4321 | JAMES | RICE |
4321 | MARY | RICE | |
5261 | BARBARA | FRY | |
5261 | RUSS | FRY | |
5261 | GEOFFREY | FRY | |
5261 | BRYON | FRY | |
5261 | DONNA | FRY | |
5261 | SUSANNE | FRY | |
5261 | PETER | FRY | |
7412 | TODD | BLUE |
ACCOUNT_NUMBER | FIRST_NM | LAST_NM | |
1234 | JOHN | JONES | |
1234 | SUE | JONES | |
1234 | MIKE | JONES | |
4321 | JAMES | RICE | |
NEED | 4321 | MARY | RICE |
5261 | BARBARA | FRY | |
5261 | RUSS | FRY | |
5261 | GEOFFREY | FRY | |
7412 | TODD | BLUE | |
How about
data have;
input ACCOUNT_NUMBER FIRST_NM $ LAST_NM $;
datalines;
1234 JOHN JONES
1234 SUE JONES
1234 MIKE JONES
1234 DAVID JONES
4321 JAMES RICE
4321 MARY RICE
5261 BARBARA FRY
5261 RUSS FRY
5261 GEOFFREY FRY
5261 BRYON FRY
5261 DONNA FRY
5261 SUSANNE FRY
5261 PETER FRY
7412 TODD BLUE
;
data want(drop=c);
set have;
by ACCOUNT_NUMBER;
if first.ACCOUNT_NUMBER then c=0;
c+1;
if c <= 3;
run;
How about
data have;
input ACCOUNT_NUMBER FIRST_NM $ LAST_NM $;
datalines;
1234 JOHN JONES
1234 SUE JONES
1234 MIKE JONES
1234 DAVID JONES
4321 JAMES RICE
4321 MARY RICE
5261 BARBARA FRY
5261 RUSS FRY
5261 GEOFFREY FRY
5261 BRYON FRY
5261 DONNA FRY
5261 SUSANNE FRY
5261 PETER FRY
7412 TODD BLUE
;
data want(drop=c);
set have;
by ACCOUNT_NUMBER;
if first.ACCOUNT_NUMBER then c=0;
c+1;
if c <= 3;
run;
Almost, the results only kept the first 3 rows of entire dataset. I need to keep up to 3 rows for every same account_number.
c | ACCOUNT_NUMBER | FIRST_NM | LAST_NM |
1 | 1234 | JOHN | JONES |
2 | 1234 | SUE | JONES |
3 | 1234 | MIKE | JONES |
Almost, the results only kept the first 3 rows of entire dataset. I need to keep up to 3 rows for every same account_number.
RESULT:
c | ACCOUNT_NUMBER | FIRST_NM | LAST_NM | |
1 | 1234 | JOHN | JONES | |
2 | 1234 | SUE | JONES | |
3 | 1234 | MIKE | JONES |
NEED:
COUNT | ACCOUNT_NUMBER | FIRST_NM | LAST_NM |
1 | 1234 | JOHN | JONES |
2 | 1234 | SUE | JONES |
3 | 1234 | MIKE | JONES |
1 | 4321 | JAMES | RICE |
2 | 4321 | MARY | RICE |
1 | 5261 | BARBARA | FRY |
2 | 5261 | RUSS | FRY |
3 | 5261 | GEOFFREY | FRY |
1 | 7412 | TODD | BLUE |
My code returns exactly those 9 obs.
You're correct, I had typo. It works. Thank you for your help, much appreciated.
@marcus7w wrote:
Almost, the results only kept the first 3 rows of entire dataset. I need to keep up to 3 rows for every same account_number.
c ACCOUNT_NUMBER FIRST_NM LAST_NM 1 1234 JOHN JONES 2 1234 SUE JONES 3 1234 MIKE JONES
You obviously have not run @PeterClemmensen's code; run it, and then come back.
My bad, first attempt, i had a typo. @PeterClemmensen code work awesome. Thank you for the quick reply, much appreciated.
Here's one more solution, slightly simpler:
data want;
do _N_=1 by 1 until(last.account_number);
set have;
by account_number;
if _N_<=3 then output;
end;
run;
I tried this simpler solution, it works too. Thank you very much for quick response!!!!
Since your dataset is already sorted by account_number, this is easily achieved in a data step:
data want;
set have;
by account_number;
if first.account_number
then count = 1;
else count + 1;
if count le 3;
drop count;
run;
The increment statement
count + 1;
causes count to automatically be retained.
Thank you for your help and quick reply. This works and very helpful!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.