BookmarkSubscribeRSS Feed
BaileyY
Obsidian | Level 7

Hi-

Any help would be appreciated.  

 

Is there an efficient way to code this in a more efficient manner.   I have to produce aggregations for various variables into a new table for further analysis.  I find myself sorting then running frequncy then adding the data step to clean up and then appending each data step for a final file.  Is there an easier way to do this.  I am running aggregations for gender/race/region/address/type1/type2/type3/type4/type5.  Each of these categories have to be unduplicated for each customer.

 

 

Proc sort data=customer nodupkey out=c1; by fname gender;run;

 

Proc freq data=c1;

Tables location*gender/list out=c1a missing;run;

 

Data c1b;

set c1a;

total=count;

subgroup=gender;

drop percent county gender;run;

 

 

7 REPLIES 7
Shmuel
Garnet | Level 18

May I say your data is made of:

 -    ID variabe is FNAME

 -   various attributes (gender/race/region/ ... ) which may occur more than once per FNAME

 

It seems to me that you should use a macro, defined once and executed per each attribute.

 

%macro frq(atrib);

         Proc sort data=customer nodupkey

                          out=tmp;

                  by fname &atrib;

         run;

 

        Proc freq data=tmp;

              Tables location*&atrib / list out=frq1 missing;

        run;

 

       Data frq_&atrib;

         set frq1;

               total=count;

               subgroup=&atrib;                    /* shouls it not be in quotes - like "&atrib" ? */

               drop percent county &attrib;   /* should it not be count instead county ? */

        run;

%mend frq;

%frq(gender);

%frq(race);

... etc ...

 

 

Astounding
PROC Star

I can't test this right now, but you should be able to get all counts in one step:

 

proc summary data=have;

class gender race region address type1 type2 type3 type4 type5;

output out=counts;

run;

 

_FREQ_ will contain the counts, and you will need to decipher the value of _TYPE_ to determine which levels of summarization you want to extract.  That won't be simple, but it will be faster and (once you appreciate the structure of the output) very organized.

Shmuel
Garnet | Level 18

@BaileyY wrote " Each of these categories have to be unduplicated for each customer.".

 

I have thought about proc summary, too, but I'm not sure the count will be right per fname/location 

in case of duplicates.

BaileyY
Obsidian | Level 7
Hi. I tried this too but ran across problems with duplication and because I had so many different class variables, when I used the output dataset for further analysis- I couldnt track and/or determine which variable was being summarized.
art297
Opal | Level 21

You can accomplish all of your renames and drops in the same proc step. e.g.:

 

proc freq data=c1;
  tables location*gender/
    list out=c2 (drop=percent rename=(count=total)) missing;
run;

would give you the same result you're currently getting.

 

Art, CEO, AnalystFinder.com

 

BaileyY
Obsidian | Level 7
Hi- I need to use the undupkey by customer and about 8 different other variables. Customer could appear multiple times within a variable and I need to count them just once.
ballardw
Super User

On thing would be to define what you mean by "easier" in context.

Is this to produce a report that people are reading? It may be that one of the report procedures like Proc Report or Tabulate may do what you want but since you do not provide any example data or what the final result should look like it is hard to provide a targetted suggestion.

 

And when you say something like "unduplicated for each customer" you really need to provide some examples with a customer duplicated in the input and the result. I have run into muliple uses of "unduplicated" and they are not always the same.

What do you want if the "customer" has multiple gender or race values?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1674 views
  • 4 likes
  • 5 in conversation