Hi guys,
My data HAVE has variables to indicate the number of certain categories of facilities found in each ZIP-code area. For instance, num_PCB means number of facilities known to discharge PCBs to an environment for each ZIP codes and same applies to the rest of num_: variables. My objective is to count the number of different categories of facilities per ZIP-code. The desired variable is 'n_diff_cat'.
Data shows that num_V, num_B, num_c exist in ZIP-code 10013 but not num_PCB and num_P. Therefore, 'n_diff_cat' would take value 3.
DATA Z.HAVE;
INPUT zip num_PCB num_P num_V num_B num_C n_diff_cat;
CARDS;
10013 0 0 7 1 4 3
10031 0 0 1 0 0 1
10037 0 0 1 0 1 2
10040 0 0 3 0 3 2
10301 1 1 1 0 1 4
10309 1 1 1 0 0 3
10312 0 0 2 1 1 3
10314 1 1 3 0 2 4
10461 1 1 0 0 0 2
10474 0 0 6 0 1 2
10475 0 0 3 1 1 3
10504 0 0 1 0 1 2
10505 0 0 2 0 2 2
10506 0 0 6 1 5 3
10509 0 0 3 0 2 2
10512 0 0 3 0 3 2
10516 0 0 0 0 0 0
10519 1 1 0 0 0 2
10520 0 0 3 0 2 2
10543 0 0 5 0 3 2
;
Thanks,
I appreciate your time!
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Less readable but shorter :
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=dim(nums)-count(cats(of nums(*)),'0');
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
data have_without_loop;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=lengthn(compress(cats(of nums(*)),'0 '));
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
I tried such a solution but was stumped by the unit length of empty strings.
🙂 if may i suggest a small nit for the assignment vs sum in your loop, the below is an equivalent
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat+(nums(i)>0);
end;
Yes, but there is an implict retain to be aware of.
I usually use the longer form when i don't take advantage of the retain.
yes and true, however in your case implicit retain is obsolete due to your initial assignment
n_diff_cat=0;
for every iteration of the datastep
Hi @gamotte and @novinosrin,
I wouldn't know why "the number of certain categories of facilities found in" a ZIP-code area cannot be greater than 9. In this case, however, both alternative suggestions involving the CATS function would fail in general:
The "APP function" approach does not seem to have this weakness.
Nice catch boss @FreelanceReinh My oh my , never expected that level of diligence. Added to my notes!!! and Thanks a lot indeed
You raise a good point 😉
Sir @FreelanceReinh Logic comes to mind in weird locations as I am away from sas right now I am at the Depaul cafe and so haven't tested this, but since it struck my mind, I just want to get it out there. Will test when I go back to the lab 🙂
n_diff_cat=countw(compress(catx(' ',of nums(*)),'0'));
data have_without_loop(drop=_t);
input zip num_PCB num_P num_V num_B num_C;
_t = put (0, rb8.) ;
array nums(*) num_:;
n_diff_cat = n(of nums(*))-count(put(peekclong (addrlong(nums[1]), 40), $40.),_t) ;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.