BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aaronh
Quartz | Level 8

Hello experts,

 

Suppose I have a dataset as follows:

data have;

input var1$ 1-12 var2$ 13-24 var3$ 25-36 var4$ 37-48;

datalines;

ElpasoDiner Amazon      Amazon      WilliamsS


MickeyD     SecondCare  Bluecross   MickeyD


ElpasoDiner ElpasoDiner Amazon      United

;

 

I am struggling to come up with a way to count the distinct elements in var1-var4 (using an array anyways). Is there a way to get count like this:

ElpasoDiner Amazon WilliamsS MickeyD SecondCare Bluecross United

1 2 1 0 0 0 0

0 0 0 2 1 1 0

2 1 0 0 0 0 1

 

 

The example I gave here is an ideal situation, but in my real dataset I often do not know the contents in each row (i.e. I do not know exactly what is in var1 - var4). And there are multiple possible text strings for var1 - var4.

 

Many thanks,

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Okay a generic one and linear simple safe method I think

 

/*1st pass to get distinct values in a macor var list*/
data _null_;
 do _n_=1 by 1 until(z);
  set have end=z;
  array v var:;
  array t(99)$32 _temporary_;/*Assign length of 32 which is variable name maximun length*/
  do over v;
  if v not in t then do;
  n+1;
  t(n)=v;
  end;
  end;
 end;
 call symputx('list',catx(' ',of t(*)));
run;

%put &=list;
/*2nd pass linear look up */
data want;
 set have;
 array v(*) var:;
 array t(*) &list;
 do i=1 to dim(t);
  t(i)=0;
  do j =1 to dim(v);
   if vname(t(i))=v(j) then t(i)=sum(t(i),1);
  end;
 end;
 drop i j;
run;

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

are they single char values as your sample suggests?

 

Assuming your sample is truly representative, it;s easy

 

data have;
input var1 $ var2 $ var3 $ var4 $;
datalines;
a a b c
a b c d
a a a b
;

data want;
 set have;
 array t a b c d;
 _k=cats(of var:);
 do over t;
  t=countc(_k,strip(vname(t)));
 end;
 drop _:;
run;
aaronh
Quartz | Level 8
Thanks novinosrin! They are actually longer names than just abcd in my datasets, sometimes as long as over 10 characters.
novinosrin
Tourmaline | Level 20

Can you post a "comprehensive" sample plz

aaronh
Quartz | Level 8
Sorry the extra blanks are not being shown in my text.
novinosrin
Tourmaline | Level 20

Okay a generic one and linear simple safe method I think

 

/*1st pass to get distinct values in a macor var list*/
data _null_;
 do _n_=1 by 1 until(z);
  set have end=z;
  array v var:;
  array t(99)$32 _temporary_;/*Assign length of 32 which is variable name maximun length*/
  do over v;
  if v not in t then do;
  n+1;
  t(n)=v;
  end;
  end;
 end;
 call symputx('list',catx(' ',of t(*)));
run;

%put &=list;
/*2nd pass linear look up */
data want;
 set have;
 array v(*) var:;
 array t(*) &list;
 do i=1 to dim(t);
  t(i)=0;
  do j =1 to dim(v);
   if vname(t(i))=v(j) then t(i)=sum(t(i),1);
  end;
 end;
 drop i j;
run;
ballardw
Super User

@aaronh wrote:

Hello experts,

 

Suppose I have a dataset as follows:

data have;

input var1$ 1-12 var2$ 13-24 var3$ 25-36 var4$ 37-48;

datalines;

ElpasoDiner Amazon      Amazon      WilliamsS


MickeyD     SecondCare  Bluecross   MickeyD


ElpasoDiner ElpasoDiner Amazon      United

;

 

I am struggling to come up with a way to count the distinct elements in var1-var4 (using an array anyways).

 

The example I gave here is an ideal situation, but in my real dataset I often do not know the contents in each row (i.e. I do not know exactly what is in var1 - var4). And there are multiple possible text strings for var1 - var4.

 

Many thanks,

 


You really need to show what you expect the output to look like for a given example input.

KachiM
Rhodochrosite | Level 12

@aaronh 

 

Another way is to read individual Vars into a Data Set. You can specify the length of the Var as a macro.

Then use Proc Freq.

 

%let Vlen=12;

data have;
infile datalines;
input ;
/* put _infile_; */
length Var $&Vlen;
do i = 1 to countw(_infile_, ' ');
   Var = scan(_infile_, i);
   output;
end;
drop i;
datalines;
ElpasoDiner Amazon      Amazon      WilliamsS
MickeyD     SecondCare  Bluecross   MickeyD
ElpasoDiner ElpasoDiner Amazon      United
;
run;


proc freq data = have;
   tables Var / out = FreqCount nopercent nocum;
run;
Tom
Super User Tom
Super User

You need to have some row id to generate the expected results of the counts per row, instead of overall counts.

If there isn't one then add one during the transpose step.

data tall ;
  set have;
  array list var1-var4 ;
  row+1;
  do col=1 to dim(list);
      word = list[col];
      output;
  end;
run;

proc freq data=tall;
  tables row*word / out=counts;
run;
aaronh
Quartz | Level 8

Thanks to everyone for your input!

 

AH

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 954 views
  • 4 likes
  • 5 in conversation