DATA Step, Macro, Functions and more

count number of specific string in a char variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 87
Accepted Solution

count number of specific string in a char variable

Hi,

 

I got a datasæt with many rows and a char variable with content of a list of numbers separated by :  

41:31: 53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62

 

I need to count and sum the number of specific numbers to create a classification as this:

 

Group1 when number is in(48,47,46,45,44,43,42,41,31,32,33,34,35,36) and I need to sum the found number of string

 

Group2 when number is in(55,54,53,52,51,61,62,63,64,) and I need to sum the found number of string

 

So the result is Group1=11 and Group2=7

 

Any ideas how to do it ?

 

Thanks in advance.


Accepted Solutions
Solution
‎11-03-2017 05:45 PM
Super User
Posts: 13,084

Re: count number of specific string in a char variable

Show some example input and the desired output for that input.

 

It is not at all clear what you are summing.

When I look at your group 2 example

41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62
value  occurrences
55     0
54     1
53     2
52     2
51     1
61     2
62     1
63     0
64     0

I get a total of 9 occurrences of the values or 6 values found. So how do you get 7?

 

This gets the counts of the substrings in group2 as an example:

data example;
   infile datalines truncover;
   informat string $100.;
   input string;
   array g {9} $ _temporary_    ("55" "54" "53" "52" "51" "61" "62" "63" "64") ;
   array gcount {9};
   do i= 1 to dim(g);
      gcount[i] =count(string,g[i],'it'); 
   end;
   drop i;
datalines;
41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62
;
run;

Gcount1 will have the count of the 55 in the string variable. Gcount9 will have the count of the 64s.

 

The sum of the counts could be:

grp2sum= sum(of gcount(*));

 

You would need a separate temporary array and separate counting array (unless you are VERY careful) to use this approach for both groups.

View solution in original post


All Replies
PROC Star
Posts: 500

Re: count number of specific string in a char variable

something like this then u can extend your idea for another one

 

data test;
length z $8.;
a= '41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62';
b ='48,47,46,45,44,43,42,41,31,32,33,34,35,36';
sum = 0;
do until(z='');
      count+1;
      z = scan(a, count,":");
	  if find(b,trim(z)) then sum+z;
	  else sum+0;
  
	  end;
	  drop i count z;
run;
Solution
‎11-03-2017 05:45 PM
Super User
Posts: 13,084

Re: count number of specific string in a char variable

Show some example input and the desired output for that input.

 

It is not at all clear what you are summing.

When I look at your group 2 example

41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62
value  occurrences
55     0
54     1
53     2
52     2
51     1
61     2
62     1
63     0
64     0

I get a total of 9 occurrences of the values or 6 values found. So how do you get 7?

 

This gets the counts of the substrings in group2 as an example:

data example;
   infile datalines truncover;
   informat string $100.;
   input string;
   array g {9} $ _temporary_    ("55" "54" "53" "52" "51" "61" "62" "63" "64") ;
   array gcount {9};
   do i= 1 to dim(g);
      gcount[i] =count(string,g[i],'it'); 
   end;
   drop i;
datalines;
41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62
;
run;

Gcount1 will have the count of the 55 in the string variable. Gcount9 will have the count of the 64s.

 

The sum of the counts could be:

grp2sum= sum(of gcount(*));

 

You would need a separate temporary array and separate counting array (unless you are VERY careful) to use this approach for both groups.

Frequent Contributor
Posts: 87

Re: count number of specific string in a char variable

This is the solution that works for me on my own data - and in additon very nice and elegant solution. Thank you very much.

Super Contributor
Posts: 478

Re: count number of specific string in a char variable

Hi,

 

In addition also look at PRXCHANGE with COUNT .

 

DATA test;
string="41:31:53:52:32:33:34:35:36:54:53:52:51:61:48:47:46:45:61:62";
group1="48,47,46,45,44,43,42,41,31,32,33,34,35,36";
group2="55,54,53,52,51,61,62,63,64";
Count_Group1=COUNT(prxchange("s/(48|47|46|45|44|43|42|41|31|32|33|34|35|36)/G1/i",-1,string),"G1");
Count_Group2=COUNT(prxchange("s/(55|54|53|52|51|61|62|63|64)/G2/i",-1,string),"G2");

run;

 

 

Thanks,
Suryakiran
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 163 views
  • 1 like
  • 4 in conversation