Help using Base SAS procedures

Efficiency coding

Reply
Occasional Contributor
Posts: 8

Efficiency coding

Hi-

Any help would be appreciated.  

 

Is there an efficient way to code this in a more efficient manner.   I have to produce aggregations for various variables into a new table for further analysis.  I find myself sorting then running frequncy then adding the data step to clean up and then appending each data step for a final file.  Is there an easier way to do this.  I am running aggregations for gender/race/region/address/type1/type2/type3/type4/type5.  Each of these categories have to be unduplicated for each customer.

 

 

Proc sort data=customer nodupkey out=c1; by fname gender;run;

 

Proc freq data=c1;

Tables location*gender/list out=c1a missing;run;

 

Data c1b;

set c1a;

total=count;

subgroup=gender;

drop percent county gender;run;

 

 

Trusted Advisor
Posts: 1,405

Re: Efficiency coding

May I say your data is made of:

 -    ID variabe is FNAME

 -   various attributes (gender/race/region/ ... ) which may occur more than once per FNAME

 

It seems to me that you should use a macro, defined once and executed per each attribute.

 

%macro frq(atrib);

         Proc sort data=customer nodupkey

                          out=tmp;

                  by fname &atrib;

         run;

 

        Proc freq data=tmp;

              Tables location*&atrib / list out=frq1 missing;

        run;

 

       Data frq_&atrib;

         set frq1;

               total=count;

               subgroup=&atrib;                    /* shouls it not be in quotes - like "&atrib" ? */

               drop percent county &attrib;   /* should it not be count instead county ? */

        run;

%mend frq;

%frq(gender);

%frq(race);

... etc ...

 

 

Super User
Posts: 5,099

Re: Efficiency coding

I can't test this right now, but you should be able to get all counts in one step:

 

proc summary data=have;

class gender race region address type1 type2 type3 type4 type5;

output out=counts;

run;

 

_FREQ_ will contain the counts, and you will need to decipher the value of _TYPE_ to determine which levels of summarization you want to extract.  That won't be simple, but it will be faster and (once you appreciate the structure of the output) very organized.

Trusted Advisor
Posts: 1,405

Re: Efficiency coding

@BaileyY wrote " Each of these categories have to be unduplicated for each customer.".

 

I have thought about proc summary, too, but I'm not sure the count will be right per fname/location 

in case of duplicates.

Occasional Contributor
Posts: 8

Re: Efficiency coding

Hi. I tried this too but ran across problems with duplication and because I had so many different class variables, when I used the output dataset for further analysis- I couldnt track and/or determine which variable was being summarized.
PROC Star
Posts: 7,366

Re: Efficiency coding

[ Edited ]

You can accomplish all of your renames and drops in the same proc step. e.g.:

 

proc freq data=c1;
  tables location*gender/
    list out=c2 (drop=percent rename=(count=total)) missing;
run;

would give you the same result you're currently getting.

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 8

Re: Efficiency coding

Hi- I need to use the undupkey by customer and about 8 different other variables. Customer could appear multiple times within a variable and I need to count them just once.
Super User
Posts: 10,550

Re: Efficiency coding

[ Edited ]

On thing would be to define what you mean by "easier" in context.

Is this to produce a report that people are reading? It may be that one of the report procedures like Proc Report or Tabulate may do what you want but since you do not provide any example data or what the final result should look like it is hard to provide a targetted suggestion.

 

And when you say something like "unduplicated for each customer" you really need to provide some examples with a customer duplicated in the input and the result. I have run into muliple uses of "unduplicated" and they are not always the same.

What do you want if the "customer" has multiple gender or race values?

Ask a Question
Discussion stats
  • 7 replies
  • 340 views
  • 4 likes
  • 5 in conversation