BookmarkSubscribeRSS Feed
kksss
Calcite | Level 5

so I have a dataset I have people with more than one observation under the same id. 

And they are also grouped under different criteria, so there are different grouping variables, e.g.

e.g. 

id    var   group_1     Group_2

1      x        3              A1

1      z         3              A1

2      y         1             B2

3      8         2            A3

3      e         2            A3

 

How can I make a report or summaried table to show the unique count of frequency by ID, under each grouping variable, and breakdown by their group value. 

Like this

 

group     headcount

1               3

2               4

A1            1

A3            6

B2          ........

.....

 

Proc SQL only counts distinct value from a variable as a whole, I don't know how to condition it. Similarly, if I use Sum(case...when) I doesn't give me discticnt count by ID. Here, to be noted that I want not the unique count of a variable, but a unique count of a variable by ID. And I don't know how to do that in Proc freq. 

 

My last resort is to separate dataset, run proc sort to delete dups, and then run proc freq.. but that willl give me a ton of tables, and I need another proc to join the info(headcounts) I want in an addional table. 

 

 

15 REPLIES 15
ballardw
Super User

It will help if your desired output comes from the example input. It is very difficult to see how group A3 gets a count of 6.

 

I think you need to transpose the data so that you take

id    var   group_1     Group_2

1      x        3              A1

1      z         3             A1

 

And turn it into

id var group

1   x     3

1   x     A1

1   z     3

1   z     A1

 

and then count the unique or distinct ID for Group so Id=1 gets counted once in group 3 and once in group A1.

Does this describe what you're attempting?

Does the variable Var have any bearing on this process? It may be easier to ignore it for the purpose of counting.

 

 

kksss
Calcite | Level 5

the thing is my grouping variables are not mutually exclusive.

 

e.g. A1 may include 1.2.3

etc..

Reeza
Super User

Please post a full example=> sample data with sample output that aligns with the sample data.

kksss
Calcite | Level 5

Alright. 

 

Input:

 

ID                Major               Major College      Major Department        Primary Major              Math_major_type                  Major Type

 

John           Math                   Science                   math                        Math                     Primary with one outside                  Primary

John           Art                      Fine Art                      Art                         Math                      Primary with one outside                  Primary 

Kate           Math                    Science                   math                        Math                     Primary with only one                  Primary

Lex              Math                   Science                   math                        Math                     Primary with one within                  Primary

Lex              Statistics             Science                   statistics                  Math                     Primary with one within                  Primary

Joe             Math                     Science                   math                        Math                     Primary with only one                  Primary

Ash             Math                     Science                   math                        Math                      Primary with only one                  Primary

Jim             Sociology                Humanity             Social Science            Sociology                Not Primary                               Not Primary

Jim             Math                      Science                   math                        Math                     Not Primary                                   Not Primary    

Jim             Statistics              Science                   statistics                  Math                     Not Primary                                 Not Primary       

 

OUT:

 

Headcount                                     Unique Freq

Total                                                       6

Primary                                                  5

Primary with one outside                      1

Primary with one within                         1

Primary with only                                  3

Not primary                                           1

 

Major count by major department       

Total Majors.........................................10

Art ......................................................1                                                     

Statistics.............................................2

Math...................................................6

Social Science...................................1

 

PS: some definition

Primary  - math Primary major

Primary with only one  - has math as primary major and that is the only major they have (e.g. Kate, Joe, Ash)

Primary with one outside - who has math as primary major but also has at leasat one major outside the college of science(e.g. John)

Primary with one within- who has math as primary major but also has at leasat another one major within the college of science(e.g. Lex)

Not Primary - Does not have math as a primary major. (Jim)

 

 

Reeza
Super User

I think you want this:

 

It won't generate the total, but in my opinion, totals should be calculated in final report not stored in the dataset. 

 

https://gist.github.com/statgeek/e0903d269d4a71316a4e

kksss
Calcite | Level 5

right... that's on on how to formatt the table afterwards... but I can use proc req because of the dups...

ballardw
Super User

Fo those things that are related to unique persons you may need to sort the data and then using first. or last. techniques assign a value, likely 1, to one record per person. Then sum those.

 

I think the bit related to the majors and primaries might be possibly using a multilabel formats, but that only works with a few procedures.

 

Please provide some example data in the form of datastep code if you want someone to test some code with your data.

I think I can do this with two tables in proc tabulate.

Ksharp
Super User

This could give you some light .

 

data have;
input ID      $          Major : $20.              MajorCollege  & $40.    MajorDepartment    & $40.     PrimaryMajor    & $40.           Math_major_type  & $40.         MajorType  & $40.;
cards;
John           Math                   Science                   math                        Math                     Primary with one outside                  Primary
John           Art                      Fine Art                      Art                         Math                      Primary with one outside                  Primary 
Kate           Math                    Science                   math                        Math                     Primary with only one                  Primary
Lex              Math                   Science                   math                        Math                     Primary with one within                  Primary
Lex              Statistics             Science                   statistics                  Math                     Primary with one within                  Primary
Joe             Math                     Science                   math                        Math                     Primary with only one                  Primary
Ash             Math                     Science                   math                        Math                      Primary with only one                  Primary
Jim             Sociology                Humanity             Social Science            Sociology                Not Primary                               Not Primary
Jim             Math                      Science                   math                        Math                     Not Primary                                   Not Primary    
Jim             Statistics              Science                   statistics                  Math                     Not Primary                                 Not Primary  
;
run;
data temp;
set have;
length group $ 40;
array x{*} $ Major -- MajorType ;
do i=1 to dim(x);
 group=x{i};    
 output;
end;
drop i;
run;
proc sql;
create table want as
 select group,count(distinct id) as n
  from temp
   group by group
 union
 select 'Total',count(distinct id) as n
  from temp;
quit; 
Reeza
Super User

You could also consider proc tabulate or a proc freq. 

 

I'm not following how your numbers are calculated below - is the sample output intended to line up with the sample data?

 

proc tabulate data=sashelp.class;
class sex age;
table sex age, n;
run;

 

kksss
Calcite | Level 5

the numbers are not calculated in line with the data, i'm just using it as an example. 

 

I don;t think Proc tabulate will resolve the issue with dup data though,

 

besides, I need to put the all output count in ONE table. 

 

LinusH
Tourmaline | Level 20
I think if you follow @ballardw you'll get what you want.
Data never sleeps
kksss
Calcite | Level 5

did you read my reply?

 

my group variables are not mutually exclusive.

Reeza
Super User

Did you try the solution? 

 

As mentioned you need to post better data, as is, we're not sure what your question is, or why things aren't working for you.

kksss
Calcite | Level 5

hold on.... I can only type this fast.... it's coming

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1569 views
  • 0 likes
  • 5 in conversation