Calcite | Level 5

## Frequency on Text

Hi everyone,

For this variable, the person lists cities they have visited:

These would be three example observations:

1. Seattle, Los Angeles, New York, San Diego, Houston

2. Seattle, Phoenix, Denver

3. Seattle, New York, Boston

I would like to run a frequency to see how many people visited Seattle, which would be 3 and New York, which would be 2. But I can't figure out how to run that type of frequency, since SAS will think that  Seattle, Los Angeles, New York, San Diego, Houston is a unique observation, and then Seattle, Phoenix, Denver is another unique observation, and Seattle, Boston is a another unique one.

1 ACCEPTED SOLUTION

Accepted Solutions
Rhodochrosite | Level 12

## Re: Frequency on Text

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city \$40 ;

Input  id \$  city \$ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city \$ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

12 REPLIES 12
Ammonite | Level 13

## Re: Frequency on Text

data have;
input text \$60.;
datalines;
Seattle, Los Angeles, New York, San Diego, Houston
Seattle, Phoenix, Denver
Seattle, New York, Boston
;

data want(keep=city);
set have;
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
output;
end;
run;

proc freq data=want;
run;

Calcite | Level 5

## Re: Frequency on Text

Can you please explain this part

do i=1 by 1 while (scan(text,i,',') ne ' ');

city=scan(text,i,',');

What is the 'text'?

Ammonite | Level 13

## Re: Frequency on Text

Hi,

Sure, text is the variable name specifies cities that each person has visited.

do i=1 by 1 while (scan(text,i,',') ne ' '); scanning text variable for each city one by one

Calcite | Level 5

## Re: Frequency on Text

It still won't work for reason. It still counts each block of phrases as 1.

Ammonite | Level 13

## Re: Frequency on Text

Just check executing the syntax to see, if it can provide you the desired output.

Calcite | Level 5

## Re: Frequency on Text

Sorry!! I messed up. The data are not separated by commas, but by double spaces

Seattle  Los Angeles  New York  San Diego  Houston

Rhodochrosite | Level 12

## Re: Frequency on Text

Double space separation suits SAS data step input statements

Something like

data detail ;

infile datalines truncover ;

Input  id \$  city \$ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city \$ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

Should produce rows with id and city with cno indicating first second or third city .

It is the trailing & which directs that strings with single spaces are accepted and double or multiple spaces act as separators

hope this untested code works

good luck

peterC

Message was edited by: Peter Crawford -   needed to point out that it is the & which allows single embedded blanks

Fluorite | Level 6

## Re: Frequency on Text

Hi Peter,

I have tested your code which is working almost fine with few problems.

The character length is limited to 8. And the CNO is not used for the purpose here, If I am not wrong.

Obs    id    city               cno

1    1.    Seattle             1

2    1.    Los Ange            2

3    1.    New York            3

4    1.    San Dieg            4

5    1.    Houston             5

6    2.    Seattle             1

7    2.    Phoenix             2

8    2.    Denver              3

9    3.    Seattle             1

10    3.    New York            2

11    3.    Boston              3

However we can use Proc sql to get the count of cities visited.

proc sql;

select city,count(city) as citycnt from detail group by city;

quit;

city              citycnt

-------------------------

Boston                  1

Denver                  1

Houston                 1

Los Ange                1

New York                2

Phoenix                 1

San Dieg                1

Seattle                 3

Rhodochrosite | Level 12

## Re: Frequency on Text

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city \$40 ;

Input  id \$  city \$ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city \$ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

Calcite | Level 5

## Re: Frequency on Text

Thanks so much!!

I Am away from computer, so I haven't tested it out yet myself.

i have a data set that has phrases with colons separated by double spaces. I will imitate the data using cities.

Seattle: washington  New York: new york  Los Angeles: California

How should I alter the code to just pick up the total number of seattle: washington?

thanks.

Rhodochrosite | Level 12

## Re: Frequency on Text

For just one city you could filter as you output

IF FIND( city, 'seattle', 'i') THEN OUTPUT ;

to replace the output statement.

For a list of cities you want to select

1 put the list in a table

2 filter rows of DETAIL  by a join with your selection list

Super User

## Re: Frequency on Text

Perl Regular Expression can do that . If yours is text file , input x & \$40. ;  or infile  dlmstr='  '  can do that as well.

```data have;
input text \$60.;
datalines;
Seattle  Los Angeles  New York  San Diego  Houston
Seattle  Phoenix  Denver
Seattle  New York  Boston
;
data want(keep=found);
set have;
pid = prxparse('/\w+(\s\w+)?/o');
start = 1;
stop = length(text);
call prxnext(pid, start, stop, text, position, length);
do while (position > 0);
found = substr(text, position, length);
output;
call prxnext(pid, start, stop, text, position, length);
end;
run;

```

Xia Keshan

Discussion stats
• 12 replies
• 1435 views
• 3 likes
• 5 in conversation