BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Khaladdin
Quartz | Level 8

Hi all,

 

I want to ask a complicated (for me) question about SAS programming. I think I can explain better by using simple example. So, I have the following dataset:

 

Group          Category 

A                  1

A                  1

A                  2

A                  1

A                  2

A                  3

B                  1

B                  2

B                  2

B                  1

B                  3

B                  2

 

I want to count the each category for each group. I can do it by using PROC FREQ. But it is not better way for my dataset. It will be time consuming for me as my dataset is too large and I have a huge number of groups. So, if I use PROC FREQ, firstly I need to create new datasets for each group and then use PROC FREQ for each group. In sum, I need to create the following dataset:

                                                                                CATEGORIES 

Group     1   (first category)                                                                 2                       3 

A             3 (the number of first category in A group is 3)                   2                       1

B             2 (the number of first category in B group is 2)                   3                       1

 

I do not write explainations for second and third categories as it is the same with the first category.I think I can explain it. Thanks for your helps.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

If it takes too much time or you are short with memory, add a sort:

proc sort data=have; by group; run;

proc freq data=have;
    by group;
    table group*category / nopercent 
       out=want;
run;

View solution in original post

8 REPLIES 8
Shmuel
Garnet | Level 18

You wrote that you have " a huge number of groups". How huge is it ?

 

Try to run:

proc freq data=have;
    table group*category / nopercent 
       out=want;
run;
Khaladdin
Quartz | Level 8
Thanks for reply. It is high frequency data. So, I have more than one million groups.
Shmuel
Garnet | Level 18

If it takes too much time or you are short with memory, add a sort:

proc sort data=have; by group; run;

proc freq data=have;
    by group;
    table group*category / nopercent 
       out=want;
run;
Khaladdin
Quartz | Level 8
Many thanks. I will try it tomorrow. I will let you know about the result. Many thanks agaib.
mkeintz
PROC Star

It appears to me that you want, for each group, to rank categories for each group.

 

Then produce one obs for each group with variables CAT1, CAT2, .... CATj.   CAT1 will contain the most frequent category (what I think you mean by "first category"), CAT2 the second most frequent, through CATj (where J is the number of distinct categories in the group having the highest cardinatliy.

 

If your data are aleady sorted by group, it's two steps:

 

  1. Use proc freq to generate a data set T1 of frequencies sorted by GROUP and then descending frequency ("order=freq") within group.
    proc freq data=have noprint order=freq;
      by group;
      table category / out=t1;
    run;
  2. Then just use proc transpose to collapse the descending frequencies for each group into a single row with vars CAT1, CAT2, ...
    proc transpose data=t1 out=want  prefix=CAT;
      by group;
      var category;
    run;

 

Now if your data are NOT sorted, it's three steps - but don't worry - you don't have to sort the original data, just the frequencies:

 

  1. (and 2) Get frequencies of group*category and sort by descending frequency within each group:
    proc freq data=have noprint ;
      table group * category / out=t1;
    run;
    proc sort data=t1;
      by group descending count;
    run;
  2. (see above)
  3. Now transpose a column to a row for each group.
    proc transpose data=t1 prefix=CAT;
      by group;
      var category;
    run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Khaladdin
Quartz | Level 8
Frequency works. But, transpose does not work.
mkeintz
PROC Star

Show the log of the non-working code please.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MikeZdeb
Rhodochrosite | Level 12

Hi, another idea (not  1 million groups, but 12 million observations  ... SUMMARY + TRANSPOSE juts a few seconds). I don't  see in your exmaple output the need to rank counts within groups.

 


data x;
input (group category) (:$1.) @@;
datalines;
A 1 A 1 A 2 A 1 A 2 A 3
B 1 B 2 B 2 B 1 B 3 B 2

 

data bigx;
set x;
do _n_=1 to 1e6;
output;
end;
run;

 

proc summary data=bigx nway;
class group category;
output out=y (drop=_type_);
run;

 

proc transpose data=y out=z (drop=_name_) prefix=cat;
by group;
var _freq_;
id category;
run;

 

DATA SET: z

group    cat1     cat2     cat3

A      3000000  2000000   1000000
B      2000000  3000000   1000000

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2249 views
  • 1 like
  • 4 in conversation