Help using Base SAS procedures

Frequency on Text

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Frequency on Text

Hi everyone,

For this variable, the person lists cities they have visited:

These would be three example observations:

1. Seattle, Los Angeles, New York, San Diego, Houston

2. Seattle, Phoenix, Denver

3. Seattle, New York, Boston

I would like to run a frequency to see how many people visited Seattle, which would be 3 and New York, which would be 2. But I can't figure out how to run that type of frequency, since SAS will think that  Seattle, Los Angeles, New York, San Diego, Houston is a unique observation, and then Seattle, Phoenix, Denver is another unique observation, and Seattle, Boston is a another unique one.

Thanks for your help.


Accepted Solutions
Solution
‎08-09-2014 10:37 AM
Valued Guide
Posts: 2,174

Re: Frequency on Text

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

View solution in original post


All Replies
Trusted Advisor
Posts: 1,204

Re: Frequency on Text

data have;
input text $60.;
datalines;
Seattle, Los Angeles, New York, San Diego, Houston
Seattle, Phoenix, Denver
Seattle, New York, Boston
;

data want(keep=city);
set have;
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
output;
end;
run;


proc freq data=want;
run;

Occasional Contributor
Posts: 15

Re: Frequency on Text

Thank you for your response.

Can you please explain this part

do i=1 by 1 while (scan(text,i,',') ne ' ');

city=scan(text,i,',');


What is the 'text'?

Trusted Advisor
Posts: 1,204

Re: Frequency on Text

Hi,

Sure, text is the variable name specifies cities that each person has visited.

do i=1 by 1 while (scan(text,i,',') ne ' '); scanning text variable for each city one by one

Occasional Contributor
Posts: 15

Re: Frequency on Text

It still won't work for reason. It still counts each block of phrases as 1.

Trusted Advisor
Posts: 1,204

Re: Frequency on Text

Just check executing the syntax to see, if it can provide you the desired output.

Occasional Contributor
Posts: 15

Re: Frequency on Text

Sorry!! I messed up. The data are not separated by commas, but by double spaces

Seattle  Los Angeles  New York  San Diego  Houston

Valued Guide
Posts: 2,174

Re: Frequency on Text

Double space separation suits SAS data step input statements

Something like

data detail ;

infile datalines truncover ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

Should produce rows with id and city with cno indicating first second or third city .

It is the trailing & which directs that strings with single spaces are accepted and double or multiple spaces act as separators

hope this untested code works

good luck

peterC

Message was edited by: Peter Crawford -   needed to point out that it is the & which allows single embedded blanks

Contributor
Posts: 45

Re: Frequency on Text

Hi Peter,

I have tested your code which is working almost fine with few problems.

The character length is limited to 8. And the CNO is not used for the purpose here, If I am not wrong.

Here is your output.

Obs    id    city               cno

                                  

  1    1.    Seattle             1

  2    1.    Los Ange            2

  3    1.    New York            3

  4    1.    San Dieg            4

  5    1.    Houston             5

  6    2.    Seattle             1

  7    2.    Phoenix             2

  8    2.    Denver              3

  9    3.    Seattle             1

10    3.    New York            2

11    3.    Boston              3

However we can use Proc sql to get the count of cities visited.

proc sql;                                         

select city,count(city) as citycnt from detail group by city;

quit;                                            

city              citycnt

-------------------------

Boston                  1

Denver                  1

Houston                 1

Los Ange                1

New York                2

Phoenix                 1

San Dieg                1

Seattle                 3

Solution
‎08-09-2014 10:37 AM
Valued Guide
Posts: 2,174

Re: Frequency on Text

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

Occasional Contributor
Posts: 15

Re: Frequency on Text

Thanks so much!!

I Am away from computer, so I haven't tested it out yet myself.

i have a data set that has phrases with colons separated by double spaces. I will imitate the data using cities.

Seattle: washington  New York: new york  Los Angeles: California

How should I alter the code to just pick up the total number of seattle: washington?

thanks.

Valued Guide
Posts: 2,174

Re: Frequency on Text

For just one city you could filter as you output

IF FIND( city, 'seattle', 'i') THEN OUTPUT ;

to replace the output statement.

For a list of cities you want to select

1 put the list in a table

2 filter rows of DETAIL  by a join with your selection list

in your sql summary query

Super User
Posts: 9,662

Re: Frequency on Text

Perl Regular Expression can do that . If yours is text file , input x & $40. ;  or infile  dlmstr='  '  can do that as well.

data have;
input text $60.;
datalines;
Seattle  Los Angeles  New York  San Diego  Houston
Seattle  Phoenix  Denver
Seattle  New York  Boston
;
data want(keep=found);
set have;
pid = prxparse('/\w+(\s\w+)?/o');
start = 1;
stop = length(text);
call prxnext(pid, start, stop, text, position, length);
do while (position > 0);
found = substr(text, position, length);
output;
call prxnext(pid, start, stop, text, position, length);
end;
run;


Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 327 views
  • 3 likes
  • 5 in conversation