Hi guys,
My data HAVE has variables to indicate the number of certain categories of facilities found in each ZIP-code area. For instance, num_PCB means number of facilities known to discharge PCBs to an environment for each ZIP codes and same applies to the rest of num_: variables. My objective is to count the number of different categories of facilities per ZIP-code. The desired variable is 'n_diff_cat'.
Data shows that num_V, num_B, num_c exist in ZIP-code 10013 but not num_PCB and num_P. Therefore, 'n_diff_cat' would take value 3.
DATA Z.HAVE;
INPUT zip num_PCB num_P num_V num_B num_C n_diff_cat;
CARDS;
10013 0 0 7 1 4 3
10031 0 0 1 0 0 1
10037 0 0 1 0 1 2
10040 0 0 3 0 3 2
10301 1 1 1 0 1 4
10309 1 1 1 0 0 3
10312 0 0 2 1 1 3
10314 1 1 3 0 2 4
10461 1 1 0 0 0 2
10474 0 0 6 0 1 2
10475 0 0 3 1 1 3
10504 0 0 1 0 1 2
10505 0 0 2 0 2 2
10506 0 0 6 1 5 3
10509 0 0 3 0 2 2
10512 0 0 3 0 3 2
10516 0 0 0 0 0 0
10519 1 1 0 0 0 2
10520 0 0 3 0 2 2
10543 0 0 5 0 3 2
;
Thanks,
I appreciate your time!
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Hello,
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat=n_diff_cat+(nums(i)>0);
end;
drop i;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Less readable but shorter :
data have;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=dim(nums)-count(cats(of nums(*)),'0');
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
data have_without_loop;
input zip num_PCB num_P num_V num_B num_C;
array nums(*) num_:;
n_diff_cat=lengthn(compress(cats(of nums(*)),'0 '));
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
I tried such a solution but was stumped by the unit length of empty strings.
🙂 if may i suggest a small nit for the assignment vs sum in your loop, the below is an equivalent
n_diff_cat=0;
do i=1 to dim(nums);
n_diff_cat+(nums(i)>0);
end;
Yes, but there is an implict retain to be aware of.
I usually use the longer form when i don't take advantage of the retain.
yes and true, however in your case implicit retain is obsolete due to your initial assignment
n_diff_cat=0;
for every iteration of the datastep
Hi @gamotte and @novinosrin,
I wouldn't know why "the number of certain categories of facilities found in" a ZIP-code area cannot be greater than 9. In this case, however, both alternative suggestions involving the CATS function would fail in general:
The "APP function" approach does not seem to have this weakness.
Nice catch boss @FreelanceReinh My oh my , never expected that level of diligence. Added to my notes!!! and Thanks a lot indeed
You raise a good point 😉
Sir @FreelanceReinh Logic comes to mind in weird locations as I am away from sas right now I am at the Depaul cafe and so haven't tested this, but since it struck my mind, I just want to get it out there. Will test when I go back to the lab 🙂
n_diff_cat=countw(compress(catx(' ',of nums(*)),'0'));
data have_without_loop(drop=_t);
input zip num_PCB num_P num_V num_B num_C;
_t = put (0, rb8.) ;
array nums(*) num_:;
n_diff_cat = n(of nums(*))-count(put(peekclong (addrlong(nums[1]), 40), $40.),_t) ;
cards;
10013 0 0 7 1 4
10031 0 0 1 0 0
10037 0 0 1 0 1
10040 0 0 3 0 3
10301 1 1 1 0 1
10309 1 1 1 0 0
10312 0 0 2 1 1
10314 1 1 3 0 2
10461 1 1 0 0 0
10474 0 0 6 0 1
10475 0 0 3 1 1
10504 0 0 1 0 1
10505 0 0 2 0 2
10506 0 0 6 1 5
10509 0 0 3 0 2
10512 0 0 3 0 3
10516 0 0 0 0 0
10519 1 1 0 0 0
10520 0 0 3 0 2
10543 0 0 5 0 3
;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: