DATA Step, Macro, Functions and more

Count the occurance of several words in a column

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Count the occurance of several words in a column

[ Edited ]

I have an Excel sheet with a few dozen columns. One of these columns is called "distribution options". In this column you find general keywords to indicate how it will be distributed. Below you will find all the available "distribution options".

 

 

Fast Track Heir

Fast Track Living Expenses

Fast Track Balance

Fast Track Surety

Manual Distribution Office

Distribution Notery

Distribution Standard

 

I wanted to read out the entire column and receive a table how many times each of these distribution options appear in the column.

 

The result in the end should look similar to this.

 

Distribution Options

Count

Fast Track Heir

102

Fast Track Living Expenses

230

Fast Track Balance

17

Fast Track Surety

43

Manual Distribution Office

2

Distribution Notery

18

Distribution Standard

43

 

I wrote a script, but for some reason it doesn't count the Fast Track Heir, Fast Track Surety and Manual Distribution Office. All the other values (the same as above) are correct.

 

I used this script, but perhaps there is an easier approach. I simply can't figure out why Heir, Surety and Manual return "0" values, whereas the rest returns the values shown above. There are no errors in the log either. This time I checked the data myself and used COUNTIF(range;"word") to count it in Excel. The values above are correct. The script somehow isn't.

  

PROC SORT IN=atropos.dossiers OUT=atropos.dos_opt;
BY dossier_manager;
RUN;

DATA atropos.dos_rap;
SET atropos.dos_opt;
BY dossier_manager;
RETAIN
	Heir
	Living_Expenses
	Balance
	Surety
	Manual
  	Notary
	Standard;

IF first.dossier_manager
THEN DO;
	Heir=0;
	Living_Expenses=0;
	Balance=0;
	Surety=0;
	Manual=0;
  	Notary=0;
	Standard=0;
END;

SELECT (distributionoption);

  	WHEN ('Fast Track 1 Heir') Heir +1;

  	WHEN ('Fast Track Living Expenses') Living_Expenses +1;

	WHEN ('Fast Track Balance') Balance +1;
  	
  	WHEN ('Fast Track Surety") Surety +1;

	WHEN ('Manual Distributionassignment Office') Manual +1;

	WHEN ('Distribution Notery') Notery +1;

	WHEN ('Distribution Standard') Standard +1;

  	OTHERWISE;
END;

IF last.dossier_manager THEN OUTPUT;
KEEP dossier_manager Heir Living_Expenses Balance Surety Manual Notary Standard;
RUN;


Accepted Solutions
Solution
‎04-13-2016 03:17 AM
Super Contributor
Posts: 308

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

Hello,

 

Are you sure the values are written the way you have added in the select clause ?

 

By the way select clause is useful when:

- you have a long series of mutually exclusive numeric conditions
- data values are uniformly distributed

 

In your case I believe an aprroach using if then else condition would be more appropriate.

 

View solution in original post


All Replies
Solution
‎04-13-2016 03:17 AM
Super Contributor
Posts: 308

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

Hello,

 

Are you sure the values are written the way you have added in the select clause ?

 

By the way select clause is useful when:

- you have a long series of mutually exclusive numeric conditions
- data values are uniformly distributed

 

In your case I believe an aprroach using if then else condition would be more appropriate.

 

Contributor
Posts: 62

Re: Count the occurance of several words in a column

Yes, the values are written correctly. I have copy pasted the values from the Excel sheet. I even tried to change

 

WHEN ('Fast Track 1 Heir') Heir +1;

 

to

 

WHEN ('%Heir%') Heir +1;

 

Even then it didn't work.

 

So you're suggestion would look like this?

 

DATA atropos.dos_rap;
SET atropos.dos_opt;
BY dossier_manager;
RETAIN
	Heir
	Living_Expenses
	Balance
	Surety
	Manual
  	Notary
	Standard;

IF first.dossier_manager
THEN DO;
	Heir=0;
	Living_Expenses=0;
	Balance=0;
	Surety=0;
	Manual=0;
  	Notary=0;
	Standard=0;
END;

SELECT (distributionoption);

  	IF distributionoption = ('Fast Track 1 Heir') THEN Heir +1;

  	ELSE IF distributionoption = ('Fast Track Living Expenses') THEN Living_Expenses +1;

	ELSE IF distributionoption = ('Fast Track Balance') THEN Balance +1;
  	
  	ELSE IF distributionoption = ('Fast Track Surety") THEN Surety +1;

	ELSE IF distributionoption = 'Manual Distributionassignment Office') THEN Manual +1;

	ELSE IF distributionoption = ('Distribution Notery') THEN Notery +1;

	ELSE IF distributionoption = ('Distribution Standard') THEN Standard +1;
END;

IF last.dossier_manager THEN OUTPUT;
KEEP dossier_manager Heir Living_Expenses Balance Surety Manual Notary Standard;
RUN;
Contributor
Posts: 62

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

I tried a different approach.

 

I imported the file. Then I took the imported data and only selected the distribution options, while storing this new selection as a new table. I noticed that some of the distribution options were incomplete and thus the query didn't work.

 

For example: "Fast Track 1 Heir" became "Fast Track 1 He" as well as with the other two.

 

I did copy and paste the exact values from the Excel file, but that wasn't the table my query tried to work with. In fact it was an import in my SAS library. Instead of checking the raw data, I should also check my imported data more. Consider this a lesson learned, because after I adjusted the names in the first script, everything ran smoothly!

 

Thanks for helping me get there.

Super User
Super User
Posts: 7,942

Re: Count the occurance of several words in a column

[ Edited ]
Posted in reply to Yves_Boonen

Hi,

 

Firstly, please code in consistent casing/indentaiton, makes it much easier to read, and posting example test data in the form of a datastep will help you get tested code.  

 

Now your problem.  I thinkk you are looking at it the wrong way round, you have list of what values you want out, and then trying to count from that.  The other way of looking at it is to proc means the data you have, and then filtering the results for those you want to keep.  This has the benefit of being able to see what counted data is present.  For example (untested);

proc means data=have;
  by distribution_options;
  output out=results;
run;
/* Filter out the ones you want to keep */
proc sql;
  create table WANT as
  select  *
  from    DISTRIBUTION_OPTIONS in (select OPTIONS from LIST_OF_VALUES);
quit;

Nice and simple way, though to be honest I don't think you need the second step, just drop records you don't want in your further processing, then its just one step.

 

Just to add, on your second post, a select clause can look like:

select(<test>);

  when(<result>) <code>;

Or ir can look like:

select;

  when (index(strip(upcase(<text>)),"HIKE")>0) <code>;

 

Super User
Posts: 7,771

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

If this is really EXACTLY your code, look at the log, because you have non-matching quotes in

WHEN ('Fast Track Surety") Surety +1;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 62

Re: Count the occurance of several words in a column

Posted in reply to KurtBremser

No it isn't.

 

The original code (in my SAS client) has variable names in Dutch. To make it more understandable, I manually changed all the names in here. Kinda screwed me over, but rest assured: no syntax errors. Smiley Happy

 

Thanks!

Super User
Posts: 5,426

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

Not sure I under the purpose of this. Seems to me as a hard coded recode of column values, count, and then a transpose...?

Similar to @RW9, I think there are smarter ways of accomplish this.

A format/lookup table can recode/rename your distribution options.

PROC TABULATE does cross tabulation out of the box - no need to transpose your data set.

Data never sleeps
Trusted Advisor
Posts: 1,228

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

Another way to do this is using proc sql:

 

data have;
input distribution_options $40.;
datalines;
Fast Track Heir
Fast Track Living Expenses
Fast Track Balance
Fast Track Surety
Manual Distribution Office
Distribution Notery
Distribution Standard
Fast Track Heir
Fast Track Living Expenses
Fast Track Balance
;

proc sql;
create table want as
select trim(distribution_options) as distribution_options , count(*) as Count
from have
group by trim(distribution_options);
quit;

 

 

Respected Advisor
Posts: 3,156

Re: Count the occurance of several words in a column

Posted in reply to Yves_Boonen

Maybe I am missing something, but Proc Freq seems to be designed for this kind of job.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 368 views
  • 2 likes
  • 7 in conversation