Quartz | Level 8

## count times of appearance for elements in a data step array

Hello experts,

Suppose I have a dataset as follows:

`data have; `

`input var1\$ 1-12 var2\$ 13-24 var3\$ 25-36 var4\$ 37-48; `

`datalines; `

`ElpasoDiner Amazon      Amazon      WilliamsS`

`MickeyD     SecondCare  Bluecross   MickeyD`

`ElpasoDiner ElpasoDiner Amazon      United`

`;`

I am struggling to come up with a way to count the distinct elements in var1-var4 (using an array anyways). Is there a way to get count like this:

`ElpasoDiner Amazon WilliamsS MickeyD SecondCare Bluecross United`

`1 2 1 0 0 0 0`

`0 0 0 2 1 1 0`

`2 1 0 0 0 0 1`

The example I gave here is an ideal situation, but in my real dataset I often do not know the contents in each row (i.e. I do not know exactly what is in var1 - var4). And there are multiple possible text strings for var1 - var4.

Many thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: count distinct elements in a data step array

Okay a generic one and linear simple safe method I think

``````/*1st pass to get distinct values in a macor var list*/
data _null_;
do _n_=1 by 1 until(z);
set have end=z;
array v var:;
array t(99)\$32 _temporary_;/*Assign length of 32 which is variable name maximun length*/
do over v;
if v not in t then do;
n+1;
t(n)=v;
end;
end;
end;
call symputx('list',catx(' ',of t(*)));
run;

%put &=list;
/*2nd pass linear look up */
data want;
set have;
array v(*) var:;
array t(*) &list;
do i=1 to dim(t);
t(i)=0;
do j =1 to dim(v);
if vname(t(i))=v(j) then t(i)=sum(t(i),1);
end;
end;
drop i j;
run;
``````
9 REPLIES 9
Tourmaline | Level 20

## Re: count distinct elements in a data step array

are they single char values as your sample suggests?

Assuming your sample is truly representative, it;s easy

``````data have;
input var1 \$ var2 \$ var3 \$ var4 \$;
datalines;
a a b c
a b c d
a a a b
;

data want;
set have;
array t a b c d;
_k=cats(of var:);
do over t;
t=countc(_k,strip(vname(t)));
end;
drop _:;
run;``````
Quartz | Level 8

## Re: count distinct elements in a data step array

Thanks novinosrin! They are actually longer names than just abcd in my datasets, sometimes as long as over 10 characters.
Tourmaline | Level 20

## Re: count distinct elements in a data step array

Can you post a "comprehensive" sample plz

Quartz | Level 8

## Re: count distinct elements in a data step array

Sorry the extra blanks are not being shown in my text.
Tourmaline | Level 20

## Re: count distinct elements in a data step array

Okay a generic one and linear simple safe method I think

``````/*1st pass to get distinct values in a macor var list*/
data _null_;
do _n_=1 by 1 until(z);
set have end=z;
array v var:;
array t(99)\$32 _temporary_;/*Assign length of 32 which is variable name maximun length*/
do over v;
if v not in t then do;
n+1;
t(n)=v;
end;
end;
end;
call symputx('list',catx(' ',of t(*)));
run;

%put &=list;
/*2nd pass linear look up */
data want;
set have;
array v(*) var:;
array t(*) &list;
do i=1 to dim(t);
t(i)=0;
do j =1 to dim(v);
if vname(t(i))=v(j) then t(i)=sum(t(i),1);
end;
end;
drop i j;
run;
``````
Super User

## Re: count distinct elements in a data step array

@aaronh wrote:

Hello experts,

Suppose I have a dataset as follows:

`data have; `

`input var1\$ 1-12 var2\$ 13-24 var3\$ 25-36 var4\$ 37-48; `

`datalines; `

`ElpasoDiner Amazon      Amazon      WilliamsS`

`MickeyD     SecondCare  Bluecross   MickeyD`

`ElpasoDiner ElpasoDiner Amazon      United`

`;`

I am struggling to come up with a way to count the distinct elements in var1-var4 (using an array anyways).

The example I gave here is an ideal situation, but in my real dataset I often do not know the contents in each row (i.e. I do not know exactly what is in var1 - var4). And there are multiple possible text strings for var1 - var4.

Many thanks,

You really need to show what you expect the output to look like for a given example input.

Rhodochrosite | Level 12

## Re: count times of appearance for elements in a data step array

Another way is to read individual Vars into a Data Set. You can specify the length of the Var as a macro.

Then use Proc Freq.

``````%let Vlen=12;

data have;
infile datalines;
input ;
/* put _infile_; */
length Var \$&Vlen;
do i = 1 to countw(_infile_, ' ');
Var = scan(_infile_, i);
output;
end;
drop i;
datalines;
ElpasoDiner Amazon      Amazon      WilliamsS
MickeyD     SecondCare  Bluecross   MickeyD
ElpasoDiner ElpasoDiner Amazon      United
;
run;

proc freq data = have;
tables Var / out = FreqCount nopercent nocum;
run;``````
Super User

## Re: count times of appearance for elements in a data step array

You need to have some row id to generate the expected results of the counts per row, instead of overall counts.

If there isn't one then add one during the transpose step.

``````data tall ;
set have;
array list var1-var4 ;
row+1;
do col=1 to dim(list);
word = list[col];
output;
end;
run;

proc freq data=tall;
tables row*word / out=counts;
run;``````
Quartz | Level 8

## Re: count times of appearance for elements in a data step array

Thanks to everyone for your input!

AH

Discussion stats
• 9 replies
• 872 views
• 4 likes
• 5 in conversation