01-21-2017 11:48 PM
Any help would be appreciated.
Is there an efficient way to code this in a more efficient manner. I have to produce aggregations for various variables into a new table for further analysis. I find myself sorting then running frequncy then adding the data step to clean up and then appending each data step for a final file. Is there an easier way to do this. I am running aggregations for gender/race/region/address/type1/type2/type3/type4/type5. Each of these categories have to be unduplicated for each customer.
Proc sort data=customer nodupkey out=c1; by fname gender;run;
Proc freq data=c1;
Tables location*gender/list out=c1a missing;run;
drop percent county gender;run;
01-22-2017 01:56 AM
May I say your data is made of:
- ID variabe is FNAME
- various attributes (gender/race/region/ ... ) which may occur more than once per FNAME
It seems to me that you should use a macro, defined once and executed per each attribute.
Proc sort data=customer nodupkey
by fname &atrib;
Proc freq data=tmp;
Tables location*&atrib / list out=frq1 missing;
subgroup=&atrib; /* shouls it not be in quotes - like "&atrib" ? */
drop percent county &attrib; /* should it not be count instead county ? */
... etc ...
01-22-2017 08:35 AM
I can't test this right now, but you should be able to get all counts in one step:
proc summary data=have;
class gender race region address type1 type2 type3 type4 type5;
_FREQ_ will contain the counts, and you will need to decipher the value of _TYPE_ to determine which levels of summarization you want to extract. That won't be simple, but it will be faster and (once you appreciate the structure of the output) very organized.
01-22-2017 01:47 PM
@BaileyY wrote " Each of these categories have to be unduplicated for each customer.".
I have thought about proc summary, too, but I'm not sure the count will be right per fname/location
in case of duplicates.
01-23-2017 05:40 PM
01-22-2017 02:36 PM - edited 01-22-2017 02:39 PM
You can accomplish all of your renames and drops in the same proc step. e.g.:
proc freq data=c1; tables location*gender/ list out=c2 (drop=percent rename=(count=total)) missing; run;
would give you the same result you're currently getting.
Art, CEO, AnalystFinder.com
01-23-2017 10:31 AM - edited 01-23-2017 05:54 PM
On thing would be to define what you mean by "easier" in context.
Is this to produce a report that people are reading? It may be that one of the report procedures like Proc Report or Tabulate may do what you want but since you do not provide any example data or what the final result should look like it is hard to provide a targetted suggestion.
And when you say something like "unduplicated for each customer" you really need to provide some examples with a customer duplicated in the input and the result. I have run into muliple uses of "unduplicated" and they are not always the same.
What do you want if the "customer" has multiple gender or race values?