Trying to analyze visit pattern by frequency and number/percentage of ...

ppphcyv · Posted 05-03-2018 01:49 PM

Hi,

I am not even sure if this is possible, so I would like to ask the experts.

I have a dataset that looks like this:

Name Location1 Loc2 Loc3 Loc4 Loc5

Sarah A B A A A

James B B C D

Albert D E F

Rosie E E E A A

William B B B B

Ashley D D

And I would like to know if each person visited ANY place (A,B,C,D,E,F) at least 80% of the times, at least 40% of the times etc. and which places. Or percentages of visiting a single place for each person, for all locations highest percentage to lowest.

So the potential answer I would like to see would be something like this:

(for each place)

Name Location_80% Loc_40%

Sarah A A (Since Sarah went to Location A 80% of the times but also 40% of the times)

James B

Albert

Rosie E

Rosie A (not sure how this will work because Rosie went to E 60% of the times and A 40% of the times)

William B B

Ashley D D

Or something like this:

Name MostFrequentLocation PercentageA 2ndMostFreqLocation PercentB 3rdFreqLocation PercentC

Sarah A 80% B 20%

James B 50% C 25% D 25%

Albert D 33.3% E 33.3% F 33.3%

Rosie E 60% A 40%

William B 100%

Ashley D 100%

It might make sense to have frequencies next to each percentages...

Is something like this possible? New and infrequent user to SAS but I would really like to do this on my own. I have SAS enterprise guide.

Astounding · Posted 05-03-2018 02:43 PM

Just to get the ball rolling, consider this possibility:

data want;

set have;

array locations {5} Location1 Loc2 Loc3 Loc4 Loc5;

do k=1 to 5;

if locations{k} > ' ' then do;

location = locations{k};

output;

end;

keep name location;

run;

That re-shapes your data into something that is easier to count. Then a possible continuation would be:

proc sort data=want;

by name;

run;

proc freq data=want order=frequency;

by name;

tables location;

run;

As with all languages, SAS has many ways to count, and many ways to display the results. This at least gives you a place to begin.

ppphcyv · Posted 05-03-2018 04:02 PM

Thank you for your reply. I actually transposed the data to see it made more sense to have it by person, since I wanted the results to be by person.

I would like to have the percentages by person, per location. Any ideas?

Thanks,

K

ballardw · Posted 05-03-2018 04:21 PM

@ppphcyv wrote:

Thank you for your reply. I actually transposed the data to see it made more sense to have it by person, since I wanted the results to be by person.

I would like to have the percentages by person, per location. Any ideas?

Thanks,

K

Did you see the suggested:

proc freq data=want order=frequency;

   by name;

   tables location;

run;

or perhaps

proc freq data=want order=frequency;
  tables name *  location;
run;

Where one of row or column percentages may be the one you are interested in.

Astounding · Posted 05-03-2018 04:46 PM

Did you actually run the suggested PROC FREQ? If so, how is the report different than what you would like to see? There's almost certainly a way to produce the results you would like as long as you decide first what those results should be.

ppphcyv · Posted 05-03-2018 05:37 PM

Hi,

Thank you for the follow up... I did try it. It is exactly what I was looking for, but I would like all of the results to be in one table, not 300,000 different tables... Any advice on putting them together? Thank you!!!

Astounding · Posted 05-03-2018 11:33 PM

Create a data set instead of a report. Then you can print the data set however you want to.

proc freq data=want order=frequency noprint;

by name;

tables location / out=percentages;

run;

Print a few observations from PERCENTAGES, so you get a feel for what the data set looks like. There are lots of ways to generate a report.

ballardw · Posted 05-04-2018 03:30 PM

Not at all sure what would cause this to create "300,000 different tables";

Take a look at the following. If I understand correctly you have a data set with the information all ready transposed so could start directly with the first proc tabulate step.

data work.have;
informat  Name $10.  Loc1  Loc2  Loc3  Loc4  Loc5 $1.;
input Name  Loc1  Loc2  Loc3  Loc4  Loc5;
datalines;
Sarah       A            B       A        A       A
James       B            B       C        D       .
Albert      D            E       F        .        .
Rosie       E            E       E        A        A
William     B            B       B        B      .
Ashley      D           D        .        .      .
;
run;


Proc transpose data=work.have out=work.trans (drop=_name_)
;
   by notsorted name;
   var loc: ;
run;

Proc tabulate data=work.trans out=work.tab;
   class name;
   class col1;
   table name,
         col1='Location' *(n rowpctn)
         / misstext=' '
   ;
run;

proc sort data=work.tab out=work.tabsort(drop= _type_ _page_ _table_);
   by name descending N;
run;

data work.want;
   set work.tabsort;
   by name;
   if first.name then order=1;
   else order+1;
run;

proc report data=work.want;
   columns name order,(col1 n pctn_10);
   define name /group;
   define order/'Visit order' across;
   label col1='Location'
         n = 'Visits'
         pctn_10 = '% of visits'
   ;
   format pctn_10 best4.; 
run;

If you are doing lots of different variables then the first proc tabulate step output might need to be looked at carefully and the sort might want to consider the variables _type_ or _table_ as those indicate import bits of information and depend on the structure of the Proc tabulate table statements.

If you have more than one variable you are summarizing in a manner similar to the Location then you would have some choices of tabulate code such as

Proc tabulate data=work.trans out=work.tab;
   class name;
   class col1 other;
   table name,
         (col1='Location' other) *(n rowpctn)
         / misstext=' '
   ;
run;

Proc tabulate data=work.trans out=work.tab2;
   class name;
   class col1 other;
   table name,
         col1='Location' *(n rowpctn)
         / misstext=' '
   ;
   table name,
         other *(n rowpctn)
         / misstext=' '
   ;
run;

With one you would look at _type_ and the other _table_. You might also get differently named pct variables.

You would also have to consider if any of your class variables were missing as tabulate by default excludes those records from calculations. Which is another reason not to use the wide version of the data as you end up with LOTS of missing values.

Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Re: Trying to analyze visit pattern by frequency and number/percentage of visits

Registration is open

Registration is open

SAS Training: Just a Click Away