BookmarkSubscribeRSS Feed
ppphcyv
Calcite | Level 5

Hi,

 

I am not even sure if this is possible, so I would like to ask the experts.

I have a dataset that looks like this:

 

Name  Location1  Loc2  Loc3  Loc4  Loc5

Sarah        A            B       A        A       A

James       B            B       C        D      

Albert        D            E       F              

Rosie        E            E       E        A        A

William     B            B        B        B      

Ashley      D           D                     

 

 

And I would like to know if each person visited ANY place (A,B,C,D,E,F) at least 80% of the times, at least 40% of the times etc. and which places.  Or percentages of visiting a single place for each person, for all locations highest percentage to lowest.

 

So the potential answer I would like to see would be something like this:

(for each place)

Name  Location_80%    Loc_40%

Sarah        A                      A               (Since Sarah went to Location A 80% of the times but also 40% of the times)

James                               B

Albert

Rosie                                E

Rosie                                A        (not sure how this will work because Rosie went to E 60% of the times and A 40% of the times)

William      B                      B

Ashley       D                     D

 

Or something like this:

Name   MostFrequentLocation  PercentageA    2ndMostFreqLocation    PercentB    3rdFreqLocation   PercentC

Sarah               A                                80%                         B                          20%                       

James              B                                50%                         C                          25%                 D                    25%

Albert               D                                33.3%                      E                         33.3%                F                   33.3%

Rosie               E                                60%                         A                          40%

William             B                               100%

Ashley              D                               100%

 

It might make sense to have frequencies next to each percentages...

 

Is something like this possible? New and infrequent user to SAS but I would really like to do this on my own. I have SAS enterprise guide.

 

 

7 REPLIES 7
Astounding
PROC Star

Just to get the ball rolling, consider this possibility:

 

data want;

set have;

array locations {5} Location1 Loc2 Loc3 Loc4 Loc5;

do k=1 to 5;

   if locations{k} > ' ' then do;

      location = locations{k};

      output;

   end;

end;

keep name location;

run;

 

That re-shapes your data into something that is easier to count.  Then a possible continuation would be:

 

proc sort data=want;

by name;

run;

 

proc freq data=want order=frequency;

   by name;

   tables location;

run;

 

As with all languages, SAS has many ways to count, and many ways to display the results.  This at least gives you a place to begin.

ppphcyv
Calcite | Level 5

Thank you for your reply. I actually transposed the data to see it made more sense to have it by person, since I wanted the results to be by person.

 

I would like to have the percentages by person, per location. Any ideas?

 

Thanks,

K

ballardw
Super User

@ppphcyv wrote:

Thank you for your reply. I actually transposed the data to see it made more sense to have it by person, since I wanted the results to be by person.

 

I would like to have the percentages by person, per location. Any ideas?

 

Thanks,

K


Did you see the suggested:

proc freq data=want order=frequency;

   by name;

   tables location;

run;

or perhaps

 

proc freq data=want order=frequency;
  tables name *  location;
run;

Where one of row or column percentages may be the one you are interested in.

 

Astounding
PROC Star

Did you actually run the suggested PROC FREQ?  If so, how is the report different than what you would like to see?  There's almost certainly a way to produce the results you would like as long as you decide first what those results should be.

ppphcyv
Calcite | Level 5

Hi,

 

Thank you for the follow up... I did try it.  It is exactly what I was looking for, but I would like all of the results to be in one table, not 300,000 different tables... Any advice on putting them together? Thank you!!!

Astounding
PROC Star

Create a data set instead of a report.  Then you can print the data set however you want to.

 

proc freq data=want order=frequency noprint;

   by name;

   tables location / out=percentages;

run;

 

Print a few observations from PERCENTAGES, so you get a feel for what the data set looks like.  There are lots of ways to generate a report.

ballardw
Super User

Not at all sure what would cause this to create "300,000 different tables";

 

Take a look at the following. If I understand correctly you have a data set with the information all ready transposed so could start directly with the first proc tabulate step.

data work.have;
informat  Name $10.  Loc1  Loc2  Loc3  Loc4  Loc5 $1.;
input Name  Loc1  Loc2  Loc3  Loc4  Loc5;
datalines;
Sarah       A            B       A        A       A
James       B            B       C        D       .
Albert      D            E       F        .        .
Rosie       E            E       E        A        A
William     B            B       B        B      .
Ashley      D           D        .        .      .
;
run;


Proc transpose data=work.have out=work.trans (drop=_name_)
;
   by notsorted name;
   var loc: ;
run;

Proc tabulate data=work.trans out=work.tab;
   class name;
   class col1;
   table name,
         col1='Location' *(n rowpctn)
         / misstext=' '
   ;
run;

proc sort data=work.tab out=work.tabsort(drop= _type_ _page_ _table_);
   by name descending N;
run;

data work.want;
   set work.tabsort;
   by name;
   if first.name then order=1;
   else order+1;
run;

proc report data=work.want;
   columns name order,(col1 n pctn_10);
   define name /group;
   define order/'Visit order' across;
   label col1='Location'
         n = 'Visits'
         pctn_10 = '% of visits'
   ;
   format pctn_10 best4.; 
run;

If you are doing lots of different variables then the first proc tabulate step output might need to be looked at carefully and the sort might want to consider the variables _type_ or _table_ as those indicate import bits of information and depend on the structure of the Proc tabulate table statements.

 

If you have more than one variable you are summarizing in a manner similar to the Location then you would have some choices of tabulate code such as

Proc tabulate data=work.trans out=work.tab;
   class name;
   class col1 other;
   table name,
         (col1='Location' other) *(n rowpctn)
         / misstext=' '
   ;
run;

Proc tabulate data=work.trans out=work.tab2;
   class name;
   class col1 other;
   table name,
         col1='Location' *(n rowpctn)
         / misstext=' '
   ;
   table name,
         other *(n rowpctn)
         / misstext=' '
   ;
run;

With one you would look at _type_ and the other _table_. You might also get differently named pct variables.

 

You would also have to consider if any of your class variables were missing as tabulate by default excludes those records from calculations. Which is another reason not to use the wide version of the data as you end up with LOTS of missing values.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 866 views
  • 0 likes
  • 3 in conversation