Hi-
Any help would be appreciated.
Is there an efficient way to code this in a more efficient manner. I have to produce aggregations for various variables into a new table for further analysis. I find myself sorting then running frequncy then adding the data step to clean up and then appending each data step for a final file. Is there an easier way to do this. I am running aggregations for gender/race/region/address/type1/type2/type3/type4/type5. Each of these categories have to be unduplicated for each customer.
Proc sort data=customer nodupkey out=c1; by fname gender;run;
Proc freq data=c1;
Tables location*gender/list out=c1a missing;run;
Data c1b;
set c1a;
total=count;
subgroup=gender;
drop percent county gender;run;
May I say your data is made of:
- ID variabe is FNAME
- various attributes (gender/race/region/ ... ) which may occur more than once per FNAME
It seems to me that you should use a macro, defined once and executed per each attribute.
%macro frq(atrib);
Proc sort data=customer nodupkey
out=tmp;
by fname &atrib;
run;
Proc freq data=tmp;
Tables location*&atrib / list out=frq1 missing;
run;
Data frq_&atrib;
set frq1;
total=count;
subgroup=&atrib; /* shouls it not be in quotes - like "&atrib" ? */
drop percent county &attrib; /* should it not be count instead county ? */
run;
%mend frq;
%frq(gender);
%frq(race);
... etc ...
I can't test this right now, but you should be able to get all counts in one step:
proc summary data=have;
class gender race region address type1 type2 type3 type4 type5;
output out=counts;
run;
_FREQ_ will contain the counts, and you will need to decipher the value of _TYPE_ to determine which levels of summarization you want to extract. That won't be simple, but it will be faster and (once you appreciate the structure of the output) very organized.
@BaileyY wrote " Each of these categories have to be unduplicated for each customer.".
I have thought about proc summary, too, but I'm not sure the count will be right per fname/location
in case of duplicates.
You can accomplish all of your renames and drops in the same proc step. e.g.:
proc freq data=c1; tables location*gender/ list out=c2 (drop=percent rename=(count=total)) missing; run;
would give you the same result you're currently getting.
Art, CEO, AnalystFinder.com
On thing would be to define what you mean by "easier" in context.
Is this to produce a report that people are reading? It may be that one of the report procedures like Proc Report or Tabulate may do what you want but since you do not provide any example data or what the final result should look like it is hard to provide a targetted suggestion.
And when you say something like "unduplicated for each customer" you really need to provide some examples with a customer duplicated in the input and the result. I have run into muliple uses of "unduplicated" and they are not always the same.
What do you want if the "customer" has multiple gender or race values?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.