4 weeks ago - last edited 4 weeks ago
I want to ask a complicated (for me) question about SAS programming. I think I can explain better by using simple example. So, I have the following dataset:
I want to count the each category for each group. I can do it by using PROC FREQ. But it is not better way for my dataset. It will be time consuming for me as my dataset is too large and I have a huge number of groups. So, if I use PROC FREQ, firstly I need to create new datasets for each group and then use PROC FREQ for each group. In sum, I need to create the following dataset:
Group 1 (first category) 2 3
A 3 (the number of first category in A group is 3) 2 1
B 2 (the number of first category in B group is 2) 3 1
I do not write explainations for second and third categories as it is the same with the first category.I think I can explain it. Thanks for your helps.
4 weeks ago
It appears to me that you want, for each group, to rank categories for each group.
Then produce one obs for each group with variables CAT1, CAT2, .... CATj. CAT1 will contain the most frequent category (what I think you mean by "first category"), CAT2 the second most frequent, through CATj (where J is the number of distinct categories in the group having the highest cardinatliy.
If your data are aleady sorted by group, it's two steps:
proc freq data=have noprint order=freq; by group; table category / out=t1; run;
proc transpose data=t1 out=want prefix=CAT; by group; var category; run;
Now if your data are NOT sorted, it's three steps - but don't worry - you don't have to sort the original data, just the frequencies:
proc freq data=have noprint ; table group * category / out=t1; run; proc sort data=t1; by group descending count; run;
proc transpose data=t1 prefix=CAT; by group; var category; run;
3 weeks ago
Hi, another idea (not 1 million groups, but 12 million observations ... SUMMARY + TRANSPOSE juts a few seconds). I don't see in your exmaple output the need to rank counts within groups.
input (group category) (:$1.) @@;
A 1 A 1 A 2 A 1 A 2 A 3
B 1 B 2 B 2 B 1 B 3 B 2
do _n_=1 to 1e6;
proc summary data=bigx nway;
class group category;
output out=y (drop=_type_);
proc transpose data=y out=z (drop=_name_) prefix=cat;
DATA SET: z
group cat1 cat2 cat3
A 3000000 2000000 1000000
B 2000000 3000000 1000000
Need further help from the community? Please ask a new question.