Hi everyone,
For this variable, the person lists cities they have visited:
These would be three example observations:
1. Seattle, Los Angeles, New York, San Diego, Houston
2. Seattle, Phoenix, Denver
3. Seattle, New York, Boston
I would like to run a frequency to see how many people visited Seattle, which would be 3 and New York, which would be 2. But I can't figure out how to run that type of frequency, since SAS will think that Seattle, Los Angeles, New York, San Diego, Houston is a unique observation, and then Seattle, Phoenix, Denver is another unique observation, and Seattle, Boston is a another unique one.
Thanks for your help.
t length limit of 8 is easily overcome by using a LENGTH statement
g like
data detail ;
infile datalines truncover ;
Length city $40 ;
Input id $ city $ & @;
do cno= 1 by 1 while( city ne ' ' ) ;
Output ;
input city $ & @ ;
end ;
Datalines ;
1. Seattle Los Angeles New York San Diego Houston
2. Seattle Phoenix Denver
3. Seattle New York Boston
;
for summary counts most sensible people use proc freq but I like
PROC SUMMARY data= detail ;
Class city ;
Output out= summary_data ;
run ;
Have a look at the output dataset work.summary_data
I like SUMMARY because it is a really useful procedure
It supports multiple summaries on a single pass using a _TYPE_ variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.
It is worth the time to look up some documentation
data have;
input text $60.;
datalines;
Seattle, Los Angeles, New York, San Diego, Houston
Seattle, Phoenix, Denver
Seattle, New York, Boston
;
data want(keep=city);
set have;
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
output;
end;
run;
proc freq data=want;
run;
Thank you for your response.
Can you please explain this part
do i=1 by 1 while (scan(text,i,',') ne ' ');
city=scan(text,i,',');
What is the 'text'?
Hi,
Sure, text is the variable name specifies cities that each person has visited.
do i=1 by 1 while (scan(text,i,',') ne ' '); scanning text variable for each city one by one
It still won't work for reason. It still counts each block of phrases as 1.
Just check executing the syntax to see, if it can provide you the desired output.
Sorry!! I messed up. The data are not separated by commas, but by double spaces
Seattle Los Angeles New York San Diego Houston
Double space separation suits SAS data step input statements
Something like
data detail ;
infile datalines truncover ;
Input id $ city $ & @;
do cno= 1 by 1 while( city ne ' ' ) ;
Output ;
input city $ & @ ;
end ;
Datalines ;
1. Seattle Los Angeles New York San Diego Houston
2. Seattle Phoenix Denver
3. Seattle New York Boston
;
Should produce rows with id and city with cno indicating first second or third city .
It is the trailing & which directs that strings with single spaces are accepted and double or multiple spaces act as separators
hope this untested code works
good luck
peterC
Message was edited by: Peter Crawford - needed to point out that it is the & which allows single embedded blanks
Hi Peter,
I have tested your code which is working almost fine with few problems.
The character length is limited to 8. And the CNO is not used for the purpose here, If I am not wrong.
Here is your output.
Obs id city cno
1 1. Seattle 1
2 1. Los Ange 2
3 1. New York 3
4 1. San Dieg 4
5 1. Houston 5
6 2. Seattle 1
7 2. Phoenix 2
8 2. Denver 3
9 3. Seattle 1
10 3. New York 2
11 3. Boston 3
However we can use Proc sql to get the count of cities visited.
proc sql;
select city,count(city) as citycnt from detail group by city;
quit;
city citycnt
-------------------------
Boston 1
Denver 1
Houston 1
Los Ange 1
New York 2
Phoenix 1
San Dieg 1
Seattle 3
t length limit of 8 is easily overcome by using a LENGTH statement
g like
data detail ;
infile datalines truncover ;
Length city $40 ;
Input id $ city $ & @;
do cno= 1 by 1 while( city ne ' ' ) ;
Output ;
input city $ & @ ;
end ;
Datalines ;
1. Seattle Los Angeles New York San Diego Houston
2. Seattle Phoenix Denver
3. Seattle New York Boston
;
for summary counts most sensible people use proc freq but I like
PROC SUMMARY data= detail ;
Class city ;
Output out= summary_data ;
run ;
Have a look at the output dataset work.summary_data
I like SUMMARY because it is a really useful procedure
It supports multiple summaries on a single pass using a _TYPE_ variable in the results to indicate the class variables that are relevant When it has value 0 the variable _FREQ_ provides the overall count.
It is worth the time to look up some documentation
Thanks so much!!
I Am away from computer, so I haven't tested it out yet myself.
i have a data set that has phrases with colons separated by double spaces. I will imitate the data using cities.
Seattle: washington New York: new york Los Angeles: California
How should I alter the code to just pick up the total number of seattle: washington?
thanks.
For just one city you could filter as you output
IF FIND( city, 'seattle', 'i') THEN OUTPUT ;
to replace the output statement.
For a list of cities you want to select
1 put the list in a table
2 filter rows of DETAIL by a join with your selection list
in your sql summary query
Perl Regular Expression can do that . If yours is text file , input x & $40. ; or infile dlmstr=' ' can do that as well.
data have; input text $60.; datalines; Seattle Los Angeles New York San Diego Houston Seattle Phoenix Denver Seattle New York Boston ; data want(keep=found); set have; pid = prxparse('/\w+(\s\w+)?/o'); start = 1; stop = length(text); call prxnext(pid, start, stop, text, position, length); do while (position > 0); found = substr(text, position, length); output; call prxnext(pid, start, stop, text, position, length); end; run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.