Hello, I am trying to using the data sample (as an example) to created a output below: can anyone help to let me know whether SAS functions can create this output?
Situation: I have a data sample of 100 obs. From the beginning 100 population, I want to calculate the exclusions and remain from each category (as the desire output). The second category start base would be the previous remain in the first category. The view is like the waterfall. The data sample is below as well.
Now I am doing this manually in excel and wonder whether in SAS there is any function could let me run it automatically
Thank you.
Data Sample:
Client_ID | Client_Age | Client_A_Score | Client_B_Score | Client_Type |
a1 | 18 | 527 | 634 | 1 |
a2 | 94 | 686 | 594 | 2 |
a3 | 42 | 486 | 820 | 4 |
a4 | 18 | 554 | 586 | 4 |
a5 | 82 | 423 | 849 | 0 |
a6 | 65 | 421 | 568 | 5 |
a7 | 100 | 435 | 832 | 4 |
a8 | 45 | 451 | 870 | 3 |
a9 | 89 | 410 | 549 | 1 |
a10 | 20 | 508 | 808 | 2 |
a11 | 63 | 589 | 792 | 5 |
a12 | 34 | 719 | 683 | 2 |
a13 | 28 | 735 | 548 | 0 |
a14 | 76 | 588 | 619 | 5 |
a15 | 53 | 414 | 581 | 0 |
a16 | 33 | 760 | 856 | 5 |
a17 | 31 | 652 | 505 | 4 |
a18 | 76 | 407 | 669 | 5 |
a19 | 37 | 659 | 844 | 1 |
a20 | 40 | 328 | 754 | 3 |
a21 | 31 | 729 | 766 | 4 |
a22 | 85 | 692 | 895 | 5 |
a23 | 84 | 318 | 691 | 3 |
a24 | 46 | 689 | 711 | 4 |
a25 | 68 | 396 | 572 | 5 |
a26 | 53 | 544 | 877 | 1 |
a27 | 22 | 556 | 537 | 3 |
a28 | 56 | 369 | 531 | 5 |
a29 | 86 | 779 | 524 | 0 |
a30 | 82 | 698 | 660 | 5 |
a31 | 90 | 793 | 682 | 5 |
a32 | 27 | 764 | 653 | 1 |
a33 | 23 | 363 | 744 | 3 |
a34 | 95 | 474 | 778 | 2 |
a35 | 97 | 334 | 762 | 4 |
a36 | 70 | 456 | 873 | 1 |
a37 | 26 | 452 | 578 | 1 |
a38 | 43 | 651 | 549 | 2 |
a39 | 28 | 695 | 659 | 3 |
a40 | 30 | 362 | 864 | 1 |
a41 | 44 | 352 | 863 | 5 |
a42 | 93 | 748 | 586 | 4 |
a43 | 79 | 394 | 855 | 5 |
a44 | 21 | 343 | 669 | 1 |
a45 | 45 | 792 | 584 | 4 |
a46 | 95 | 534 | 872 | 4 |
a47 | 34 | 528 | 788 | 3 |
a48 | 81 | 407 | 802 | 0 |
a49 | 63 | 497 | 864 | 0 |
a50 | 92 | 470 | 803 | 3 |
a51 | 64 | 310 | 863 | 4 |
a52 | 34 | 346 | 749 | 1 |
a53 | 50 | 678 | 583 | 2 |
a54 | 99 | 478 | 521 | 2 |
a55 | 48 | 303 | 577 | 0 |
a56 | 29 | 596 | 889 | 5 |
a57 | 25 | 636 | 696 | 0 |
a58 | 99 | 648 | 848 | 1 |
a59 | 74 | 562 | 876 | 0 |
a60 | 66 | 676 | 526 | 0 |
a61 | 27 | 723 | 597 | 3 |
a62 | 54 | 731 | 597 | 5 |
a63 | 54 | 679 | 880 | 4 |
a64 | 62 | 488 | 803 | 5 |
a65 | 79 | 378 | 727 | 1 |
a66 | 21 | 492 | 618 | 1 |
a67 | 20 | 742 | 576 | 4 |
a68 | 49 | 572 | 764 | 5 |
a69 | 19 | 799 | 839 | 4 |
a70 | 32 | 691 | 590 | 2 |
a71 | 40 | 591 | 590 | 0 |
a72 | 29 | 782 | 881 | 0 |
a73 | 52 | 471 | 665 | 5 |
a74 | 24 | 508 | 885 | 3 |
a75 | 100 | 324 | 723 | 1 |
a76 | 41 | 609 | 847 | 5 |
a77 | 94 | 710 | 553 | 5 |
a78 | 51 | 595 | 681 | 5 |
a79 | 82 | 593 | 553 | 1 |
a80 | 79 | 527 | 615 | 2 |
a81 | 99 | 540 | 632 | 5 |
a82 | 72 | 579 | 519 | 4 |
a83 | 33 | 739 | 537 | 4 |
a84 | 45 | 667 | 547 | 2 |
a85 | 92 | 750 | 558 | 4 |
a86 | 20 | 554 | 894 | 4 |
a87 | 48 | 784 | 763 | 0 |
a88 | 69 | 747 | 719 | 3 |
a89 | 47 | 600 | 879 | 5 |
a90 | 50 | 775 | 790 | 3 |
a91 | 78 | 787 | 633 | 2 |
a92 | 84 | 469 | 758 | 5 |
a93 | 87 | 404 | 574 | 1 |
a94 | 68 | 383 | 707 | 4 |
a95 | 95 | 398 | 729 | 5 |
a96 | 43 | 364 | 741 | 3 |
a97 | 92 | 407 | 509 | 4 |
a98 | 78 | 605 | 896 | 5 |
a99 | 93 | 676 | 570 | 3 |
a100 | 74 | 720 | 763 | 2 |
Desired Output:
Excluded | Waterfall | |
Starting Base | 100 | |
Keep Age 30-100 | 19 | 81 |
Keep Client_A_Score>=500 | 35 | 46 |
Keep Client_B_Score>=600 | 18 | 28 |
Keep Client Type in (0,1,2) | 19 | 9 |
End Base | 9 |
The SQL gods may shoot me but you can cheat it like this if nobody has something more elegant.
data clientdata;
infile cards dlm=" ";
input Client_ID:$4. Client_Age Client_A_Score
Client_B_Score Client_Type;
datalines;
a1 18 527 634 1
a2 94 686 594 2
a3 42 486 820 4
a4 18 554 586 4
a5 82 423 849 0
a6 65 421 568 5
a7 100 435 832 4
a8 45 451 870 3
a9 89 410 549 1
a10 20 508 808 2
a11 63 589 792 5
a12 34 719 683 2
a13 28 735 548 0
a14 76 588 619 5
a15 53 414 581 0
a16 33 760 856 5
a17 31 652 505 4
a18 76 407 669 5
a19 37 659 844 1
a20 40 328 754 3
a21 31 729 766 4
a22 85 692 895 5
a23 84 318 691 3
a24 46 689 711 4
a25 68 396 572 5
a26 53 544 877 1
a27 22 556 537 3
a28 56 369 531 5
a29 86 779 524 0
a30 82 698 660 5
a31 90 793 682 5
a32 27 764 653 1
a33 23 363 744 3
a34 95 474 778 2
a35 97 334 762 4
a36 70 456 873 1
a37 26 452 578 1
a38 43 651 549 2
a39 28 695 659 3
a40 30 362 864 1
a41 44 352 863 5
a42 93 748 586 4
a43 79 394 855 5
a44 21 343 669 1
a45 45 792 584 4
a46 95 534 872 4
a47 34 528 788 3
a48 81 407 802 0
a49 63 497 864 0
a50 92 470 803 3
a51 64 310 863 4
a52 34 346 749 1
a53 50 678 583 2
a54 99 478 521 2
a55 48 303 577 0
a56 29 596 889 5
a57 25 636 696 0
a58 99 648 848 1
a59 74 562 876 0
a60 66 676 526 0
a61 27 723 597 3
a62 54 731 597 5
a63 54 679 880 4
a64 62 488 803 5
a65 79 378 727 1
a66 21 492 618 1
a67 20 742 576 4
a68 49 572 764 5
a69 19 799 839 4
a70 32 691 590 2
a71 40 591 590 0
a72 29 782 881 0
a73 52 471 665 5
a74 24 508 885 3
a75 100 324 723 1
a76 41 609 847 5
a77 94 710 553 5
a78 51 595 681 5
a79 82 593 553 1
a80 79 527 615 2
a81 99 540 632 5
a82 72 579 519 4
a83 33 739 537 4
a84 45 667 547 2
a85 92 750 558 4
a86 20 554 894 4
a87 48 784 763 0
a88 69 747 719 3
a89 47 600 879 5
a90 50 775 790 3
a91 78 787 633 2
a92 84 469 758 5
a93 87 404 574 1
a94 68 383 707 4
a95 95 398 729 5
a96 43 364 741 3
a97 92 407 509 4
a98 78 605 896 5
a99 93 676 570 3
a100 74 720 763 2
;
run;
proc sql;
select 1, "Starting base", 0 as Excluded, count(*) as Waterfall
from clientdata
union
select 2, "Keep Age 30-100",
count(case when client_age <30 or client_age >100 then 1 end),
count(case when client_age between 30 and 100 then 1 end)
from clientdata
union
select 3, "Keep Client_A_Score>=500",
count(case when client_a_score < 500 and client_age between 30 and 100 then 1 end),
count(case when client_a_score >= 500 and client_age between 30 and 100 then 1 end)
from clientdata
union
select 4, "Keep Client_B_Score>=600",
count(case when client_b_score <600
and client_a_score >=500
and client_age between 30 and 100 then 1 end),
count(case when client_b_score >= 600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
union
select 5, "Keep Client Type in (0,1,2)",
count(case when client_type not in (0,1,2)
and client_b_score >=600
and client_a_score >=500
and client_age between 30 and 100 then 1 end),
count(case when client_type in (0,1,2)
and client_b_score >= 600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
union
select 6, "End Base", 0 as Excluded,
count(case when client_type in (0,1,2)
and client_b_score >=600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
order by 1;
quit;
which gives you
Excluded | Waterfall | ||
1 | Starting base | 0 | 100 |
2 | Keep Age 30-100 | 19 | 81 |
3 | Keep Client_A_Score>=500 | 35 | 46 |
4 | Keep Client_B_Score>=600 | 18 | 28 |
5 | Keep Client Type in (0,1,2) | 19 | 9 |
6 | End Base | 0 | 9 |
It's ugly.
The SQL gods may shoot me but you can cheat it like this if nobody has something more elegant.
data clientdata;
infile cards dlm=" ";
input Client_ID:$4. Client_Age Client_A_Score
Client_B_Score Client_Type;
datalines;
a1 18 527 634 1
a2 94 686 594 2
a3 42 486 820 4
a4 18 554 586 4
a5 82 423 849 0
a6 65 421 568 5
a7 100 435 832 4
a8 45 451 870 3
a9 89 410 549 1
a10 20 508 808 2
a11 63 589 792 5
a12 34 719 683 2
a13 28 735 548 0
a14 76 588 619 5
a15 53 414 581 0
a16 33 760 856 5
a17 31 652 505 4
a18 76 407 669 5
a19 37 659 844 1
a20 40 328 754 3
a21 31 729 766 4
a22 85 692 895 5
a23 84 318 691 3
a24 46 689 711 4
a25 68 396 572 5
a26 53 544 877 1
a27 22 556 537 3
a28 56 369 531 5
a29 86 779 524 0
a30 82 698 660 5
a31 90 793 682 5
a32 27 764 653 1
a33 23 363 744 3
a34 95 474 778 2
a35 97 334 762 4
a36 70 456 873 1
a37 26 452 578 1
a38 43 651 549 2
a39 28 695 659 3
a40 30 362 864 1
a41 44 352 863 5
a42 93 748 586 4
a43 79 394 855 5
a44 21 343 669 1
a45 45 792 584 4
a46 95 534 872 4
a47 34 528 788 3
a48 81 407 802 0
a49 63 497 864 0
a50 92 470 803 3
a51 64 310 863 4
a52 34 346 749 1
a53 50 678 583 2
a54 99 478 521 2
a55 48 303 577 0
a56 29 596 889 5
a57 25 636 696 0
a58 99 648 848 1
a59 74 562 876 0
a60 66 676 526 0
a61 27 723 597 3
a62 54 731 597 5
a63 54 679 880 4
a64 62 488 803 5
a65 79 378 727 1
a66 21 492 618 1
a67 20 742 576 4
a68 49 572 764 5
a69 19 799 839 4
a70 32 691 590 2
a71 40 591 590 0
a72 29 782 881 0
a73 52 471 665 5
a74 24 508 885 3
a75 100 324 723 1
a76 41 609 847 5
a77 94 710 553 5
a78 51 595 681 5
a79 82 593 553 1
a80 79 527 615 2
a81 99 540 632 5
a82 72 579 519 4
a83 33 739 537 4
a84 45 667 547 2
a85 92 750 558 4
a86 20 554 894 4
a87 48 784 763 0
a88 69 747 719 3
a89 47 600 879 5
a90 50 775 790 3
a91 78 787 633 2
a92 84 469 758 5
a93 87 404 574 1
a94 68 383 707 4
a95 95 398 729 5
a96 43 364 741 3
a97 92 407 509 4
a98 78 605 896 5
a99 93 676 570 3
a100 74 720 763 2
;
run;
proc sql;
select 1, "Starting base", 0 as Excluded, count(*) as Waterfall
from clientdata
union
select 2, "Keep Age 30-100",
count(case when client_age <30 or client_age >100 then 1 end),
count(case when client_age between 30 and 100 then 1 end)
from clientdata
union
select 3, "Keep Client_A_Score>=500",
count(case when client_a_score < 500 and client_age between 30 and 100 then 1 end),
count(case when client_a_score >= 500 and client_age between 30 and 100 then 1 end)
from clientdata
union
select 4, "Keep Client_B_Score>=600",
count(case when client_b_score <600
and client_a_score >=500
and client_age between 30 and 100 then 1 end),
count(case when client_b_score >= 600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
union
select 5, "Keep Client Type in (0,1,2)",
count(case when client_type not in (0,1,2)
and client_b_score >=600
and client_a_score >=500
and client_age between 30 and 100 then 1 end),
count(case when client_type in (0,1,2)
and client_b_score >= 600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
union
select 6, "End Base", 0 as Excluded,
count(case when client_type in (0,1,2)
and client_b_score >=600
and client_a_score >=500
and client_age between 30 and 100 then 1 end)
from clientdata
order by 1;
quit;
which gives you
Excluded | Waterfall | ||
1 | Starting base | 0 | 100 |
2 | Keep Age 30-100 | 19 | 81 |
3 | Keep Client_A_Score>=500 | 35 | 46 |
4 | Keep Client_B_Score>=600 | 18 | 28 |
5 | Keep Client Type in (0,1,2) | 19 | 9 |
6 | End Base | 0 | 9 |
It's ugly.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.