Solved: Re: Frequency on Text

SASbeginner20 · Posted 08-08-2014 04:06 PM

Hi everyone,

For this variable, the person lists cities they have visited:

These would be three example observations:

1. Seattle, Los Angeles, New York, San Diego, Houston

2. Seattle, Phoenix, Denver

3. Seattle, New York, Boston

I would like to run a frequency to see how many people visited Seattle, which would be 3 and New York, which would be 2. But I can't figure out how to run that type of frequency, since SAS will think that Seattle, Los Angeles, New York, San Diego, Houston is a unique observation, and then Seattle, Phoenix, Denver is another unique observation, and Seattle, Boston is a another unique one.

Thanks for your help.

Peter_C · Posted 08-09-2014 10:37 AM

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input id $ city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city $ & @ ;

end ;

Datalines ;

1. Seattle Los Angeles New York San Diego Houston

2. Seattle Phoenix Denver

3. Seattle New York Boston

;

for summary counts most sensible people use proc freq but I like

PROC SUMMARY data= detail ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_ variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

View solution in original post

stat_sas · Posted 08-08-2014 04:20 PM

data have;
input text $60.;
datalines;
Seattle, Los Angeles, New York, San Diego, Houston
Seattle, Phoenix, Denver
Seattle, New York, Boston
;

data want(keep=city);
set have;
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
output;
end;
run;

proc freq data=want;
run;

SASbeginner20 · Posted 08-08-2014 04:28 PM

Thank you for your response.

Can you please explain this part

do i=1 by 1 while (scan(text,i,',') ne ' ');

city=scan(text,i,',');

What is the 'text'?

stat_sas · Posted 08-08-2014 04:37 PM

Hi,

Sure, text is the variable name specifies cities that each person has visited.

do i=1 by 1 while (scan(text,i,',') ne ' '); scanning text variable for each city one by one

SASbeginner20 · Posted 08-08-2014 04:53 PM

It still won't work for reason. It still counts each block of phrases as 1.

stat_sas · Posted 08-08-2014 05:01 PM

Just check executing the syntax to see, if it can provide you the desired output.

SASbeginner20 · Posted 08-08-2014 05:00 PM

Sorry!! I messed up. The data are not separated by commas, but by double spaces

Seattle Los Angeles New York San Diego Houston

Peter_C · Posted 08-08-2014 05:29 PM

Double space separation suits SAS data step input statements

Something like

data detail ;

infile datalines truncover ;

Input id $ city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city $ & @ ;

end ;

Datalines ;

1. Seattle Los Angeles New York San Diego Houston

2. Seattle Phoenix Denver

3. Seattle New York Boston

;

Should produce rows with id and city with cno indicating first second or third city .

It is the trailing & which directs that strings with single spaces are accepted and double or multiple spaces act as separators

hope this untested code works

good luck

peterC

Message was edited by: Peter Crawford - needed to point out that it is the & which allows single embedded blanks

RaviKommuri · Posted 08-09-2014 05:00 AM

Hi Peter,

I have tested your code which is working almost fine with few problems.

The character length is limited to 8. And the CNO is not used for the purpose here, If I am not wrong.

Here is your output.

Obs id city cno

1 1. Seattle 1

2 1. Los Ange 2

3 1. New York 3

4 1. San Dieg 4

5 1. Houston 5

6 2. Seattle 1

7 2. Phoenix 2

8 2. Denver 3

9 3. Seattle 1

10 3. New York 2

11 3. Boston 3

However we can use Proc sql to get the count of cities visited.

proc sql;

select city,count(city) as citycnt from detail group by city;

quit;

city citycnt

-------------------------

Boston 1

Denver 1

Houston 1

Los Ange 1

New York 2

Phoenix 1

San Dieg 1

Seattle 3

Peter_C · Posted 08-09-2014 10:37 AM

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input id $ city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

Output ;

input city $ & @ ;

end ;

Datalines ;

1. Seattle Los Angeles New York San Diego Houston

2. Seattle Phoenix Denver

3. Seattle New York Boston

;

for summary counts most sensible people use proc freq but I like

PROC SUMMARY data= detail ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_ variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

SASbeginner20 · Posted 08-09-2014 11:58 AM

Thanks so much!!

I Am away from computer, so I haven't tested it out yet myself.

i have a data set that has phrases with colons separated by double spaces. I will imitate the data using cities.

Seattle: washington New York: new york Los Angeles: California

How should I alter the code to just pick up the total number of seattle: washington?

thanks.

Peter_C · Posted 08-09-2014 03:11 PM

For just one city you could filter as you output

IF FIND( city, 'seattle', 'i') THEN OUTPUT ;

to replace the output statement.

For a list of cities you want to select

1 put the list in a table

2 filter rows of DETAIL by a join with your selection list

in your sql summary query

Ksharp · Posted 08-09-2014 09:46 AM

Perl Regular Expression can do that . If yours is text file , input x & $40. ; or infile dlmstr=' ' can do that as well.

data have;
input text $60.;
datalines;
Seattle  Los Angeles  New York  San Diego  Houston
Seattle  Phoenix  Denver
Seattle  New York  Boston
;
data want(keep=found);
set have;
pid = prxparse('/\w+(\s\w+)?/o');
start = 1;
stop = length(text);
call prxnext(pid, start, stop, text, position, length);
do while (position > 0);
found = substr(text, position, length);
output;
call prxnext(pid, start, stop, text, position, length);
end;
run;

Xia Keshan

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away