Hi guys,
My data HAVE has variables to indicate the number of certain categories of facilities found in each ZIP-code area. For instance, num_PCB means number of facilities known to discharge PCBs to an environment for each ZIP codes and same applies to the rest of num_: variables. My objective is to count the number of different categories of facilities per ZIP-code. The desired variable is 'n_diff_cat'.
Data shows that num_V, num_B, num_c exist in ZIP-code 10013 but not num_PCB and num_P. Therefore, 'n_diff_cat' would take value 3.
DATA Z.HAVE;
INPUT zip num_PCB num_P num_V num_B num_C n_diff_cat;
CARDS;
10013 0 0 7 1 4 3
10031 0 0 1 0 0 1
10037 0 0 1 0 1 2
10040 0 0 3 0 3 2
10301 1 1 1 0 1 4
10309 1 1 1 0 0 3
10312 0 0 2 1 1 3
10314 1 1 3 0 2 4
10461 1 1 0 0 0 2
10474 0 0 6 0 1 2
10475 0 0 3 1 1 3
10504 0 0 1 0 1 2
10505 0 0 2 0 2 2
10506 0 0 6 1 5 3
10509 0 0 3 0 2 2
10512 0 0 3 0 3 2
10516 0 0 0 0 0 0
10519 1 1 0 0 0 2
10520 0 0 3 0 2 2
10543 0 0 5 0 3 2
;
Thanks,
I appreciate your time!
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Less readable but shorter :
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=dim(nums)-count(cats(of nums(*)),'0');
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
data have_without_loop;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=lengthn(compress(cats(of nums(*)),'0 '));
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
I tried such a solution but was stumped by the unit length of empty strings.
🙂 if may i suggest a small nit for the assignment vs sum in your loop, the below is an equivalent
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat+(nums(i)>0);
end;
Yes, but there is an implict retain to be aware of.
I usually use the longer form when i don't take advantage of the retain.
yes and true, however in your case implicit retain is obsolete due to your initial assignment
n_diff_cat=0;
for every iteration of the datastep
Hi @gamotte and @novinosrin,
I wouldn't know why "the number of certain categories of facilities found in" a ZIP-code area cannot be greater than 9. In this case, however, both alternative suggestions involving the CATS function would fail in general:
The "APP function" approach does not seem to have this weakness.
Nice catch boss @FreelanceReinh My oh my , never expected that level of diligence. Added to my notes!!! and Thanks a lot indeed
You raise a good point 😉
Sir @FreelanceReinh Logic comes to mind in weird locations as I am away from sas right now I am at the Depaul cafe and so haven't tested this, but since it struck my mind, I just want to get it out there. Will test when I go back to the lab 🙂
n_diff_cat=countw(compress(catx(' ',of nums(*)),'0'));
data have_without_loop(drop=_t);
input zip num_PCB num_P num_V num_B num_C;
_t = put (0, rb8.) ;
array nums(*) num_:;
n_diff_cat = n(of nums(*))-count(put(peekclong (addrlong(nums[1]), 40), $40.),_t) ;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.