Re: Efficiency coding

BaileyY · Posted 01-21-2017 11:48 PM

Hi-

Any help would be appreciated.

Is there an efficient way to code this in a more efficient manner. I have to produce aggregations for various variables into a new table for further analysis. I find myself sorting then running frequncy then adding the data step to clean up and then appending each data step for a final file. Is there an easier way to do this. I am running aggregations for gender/race/region/address/type1/type2/type3/type4/type5. Each of these categories have to be unduplicated for each customer.

Proc sort data=customer nodupkey out=c1; by fname gender;run;

Proc freq data=c1;

Tables location*gender/list out=c1a missing;run;

Data c1b;

set c1a;

total=count;

subgroup=gender;

drop percent county gender;run;

Shmuel · Posted 01-22-2017 01:56 AM

May I say your data is made of:

- ID variabe is FNAME

- various attributes (gender/race/region/ ... ) which may occur more than once per FNAME

It seems to me that you should use a macro, defined once and executed per each attribute.

%macro frq(atrib);

Proc sort data=customer nodupkey

out=tmp;

by fname &atrib;

run;

Proc freq data=tmp;

Tables location*&atrib / list out=frq1 missing;

run;

Data frq_&atrib;

set frq1;

total=count;

subgroup=&atrib; /* shouls it not be in quotes - like "&atrib" ? */

drop percent county &attrib; /* should it not be count instead county ? */

run;

%mend frq;

%frq(gender);

%frq(race);

... etc ...

Astounding · Posted 01-22-2017 08:35 AM

I can't test this right now, but you should be able to get all counts in one step:

proc summary data=have;

class gender race region address type1 type2 type3 type4 type5;

output out=counts;

run;

_FREQ_ will contain the counts, and you will need to decipher the value of _TYPE_ to determine which levels of summarization you want to extract. That won't be simple, but it will be faster and (once you appreciate the structure of the output) very organized.

Shmuel · Posted 01-22-2017 01:47 PM

@BaileyY wrote " Each of these categories have to be unduplicated for each customer.".

I have thought about proc summary, too, but I'm not sure the count will be right per fname/location

in case of duplicates.

BaileyY · Posted 01-23-2017 05:40 PM

Hi. I tried this too but ran across problems with duplication and because I had so many different class variables, when I used the output dataset for further analysis- I couldnt track and/or determine which variable was being summarized.

art297 · Posted 01-22-2017 02:36 PM

You can accomplish all of your renames and drops in the same proc step. e.g.:

proc freq data=c1;
  tables location*gender/
    list out=c2 (drop=percent rename=(count=total)) missing;
run;

would give you the same result you're currently getting.

Art, CEO, AnalystFinder.com

BaileyY · Posted 01-23-2017 05:37 PM

Hi- I need to use the undupkey by customer and about 8 different other variables. Customer could appear multiple times within a variable and I need to count them just once.

ballardw · Posted 01-23-2017 10:31 AM

On thing would be to define what you mean by "easier" in context.

Is this to produce a report that people are reading? It may be that one of the report procedures like Proc Report or Tabulate may do what you want but since you do not provide any example data or what the final result should look like it is hard to provide a targetted suggestion.

And when you say something like "unduplicated for each customer" you really need to provide some examples with a customer duplicated in the input and the result. I have run into muliple uses of "unduplicated" and they are not always the same.

What do you want if the "customer" has multiple gender or race values?

Registration is open

SAS Training: Just a Click Away