BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASbeginner20
Calcite | Level 5

Hi everyone,

For this variable, the person lists cities they have visited:

These would be three example observations:

1. Seattle, Los Angeles, New York, San Diego, Houston

2. Seattle, Phoenix, Denver

3. Seattle, New York, Boston

I would like to run a frequency to see how many people visited Seattle, which would be 3 and New York, which would be 2. But I can't figure out how to run that type of frequency, since SAS will think that  Seattle, Los Angeles, New York, San Diego, Houston is a unique observation, and then Seattle, Phoenix, Denver is another unique observation, and Seattle, Boston is a another unique one.

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Peter_C
Rhodochrosite | Level 12

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

View solution in original post

12 REPLIES 12
stat_sas
Ammonite | Level 13

data have;
input text $60.;
datalines;
Seattle, Los Angeles, New York, San Diego, Houston
Seattle, Phoenix, Denver
Seattle, New York, Boston
;

data want(keep=city);
set have;
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
output;
end;
run;


proc freq data=want;
run;

SASbeginner20
Calcite | Level 5

Thank you for your response.

Can you please explain this part

do i=1 by 1 while (scan(text,i,',') ne ' ');

city=scan(text,i,',');


What is the 'text'?

stat_sas
Ammonite | Level 13

Hi,

Sure, text is the variable name specifies cities that each person has visited.

do i=1 by 1 while (scan(text,i,',') ne ' '); scanning text variable for each city one by one

SASbeginner20
Calcite | Level 5

It still won't work for reason. It still counts each block of phrases as 1.

stat_sas
Ammonite | Level 13

Just check executing the syntax to see, if it can provide you the desired output.

SASbeginner20
Calcite | Level 5

Sorry!! I messed up. The data are not separated by commas, but by double spaces

Seattle  Los Angeles  New York  San Diego  Houston

Peter_C
Rhodochrosite | Level 12

Double space separation suits SAS data step input statements

Something like

data detail ;

infile datalines truncover ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

Should produce rows with id and city with cno indicating first second or third city .

It is the trailing & which directs that strings with single spaces are accepted and double or multiple spaces act as separators

hope this untested code works

good luck

peterC

Message was edited by: Peter Crawford -   needed to point out that it is the & which allows single embedded blanks

RaviKommuri
Fluorite | Level 6

Hi Peter,

I have tested your code which is working almost fine with few problems.

The character length is limited to 8. And the CNO is not used for the purpose here, If I am not wrong.

Here is your output.

Obs    id    city               cno

                                  

  1    1.    Seattle             1

  2    1.    Los Ange            2

  3    1.    New York            3

  4    1.    San Dieg            4

  5    1.    Houston             5

  6    2.    Seattle             1

  7    2.    Phoenix             2

  8    2.    Denver              3

  9    3.    Seattle             1

10    3.    New York            2

11    3.    Boston              3

However we can use Proc sql to get the count of cities visited.

proc sql;                                         

select city,count(city) as citycnt from detail group by city;

quit;                                            

city              citycnt

-------------------------

Boston                  1

Denver                  1

Houston                 1

Los Ange                1

New York                2

Phoenix                 1

San Dieg                1

Seattle                 3

Peter_C
Rhodochrosite | Level 12

t length limit of 8 is easily overcome by using a LENGTH statement

g like

data detail ;

infile datalines truncover ;

Length city $40 ;

Input  id $  city $ & @;

do cno= 1 by 1 while( city ne ' ' ) ;

   Output ;

   input city $ & @ ;

end ;

Datalines ;

1. Seattle  Los Angeles  New York  San Diego   Houston

2. Seattle  Phoenix  Denver

3. Seattle  New York  Boston

;

for summary counts most sensible people use proc freq   but I like

PROC SUMMARY data= detail   ;

Class city ;

Output out= summary_data ;

run ;

Have a look at the output dataset work.summary_data

I like SUMMARY because it is a really useful procedure

It supports multiple summaries on a single pass using a _TYPE_  variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.

It is worth the time to look up some documentation

SASbeginner20
Calcite | Level 5

Thanks so much!!

I Am away from computer, so I haven't tested it out yet myself.

i have a data set that has phrases with colons separated by double spaces. I will imitate the data using cities.

Seattle: washington  New York: new york  Los Angeles: California

How should I alter the code to just pick up the total number of seattle: washington?

thanks.

Peter_C
Rhodochrosite | Level 12

For just one city you could filter as you output

IF FIND( city, 'seattle', 'i') THEN OUTPUT ;

to replace the output statement.

For a list of cities you want to select

1 put the list in a table

2 filter rows of DETAIL  by a join with your selection list

in your sql summary query

Ksharp
Super User

Perl Regular Expression can do that . If yours is text file , input x & $40. ;  or infile  dlmstr='  '  can do that as well.

data have;
input text $60.;
datalines;
Seattle  Los Angeles  New York  San Diego  Houston
Seattle  Phoenix  Denver
Seattle  New York  Boston
;
data want(keep=found);
set have;
pid = prxparse('/\w+(\s\w+)?/o');
start = 1;
stop = length(text);
call prxnext(pid, start, stop, text, position, length);
do while (position > 0);
found = substr(text, position, length);
output;
call prxnext(pid, start, stop, text, position, length);
end;
run;


Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1220 views
  • 3 likes
  • 5 in conversation