DATA Step, Macro, Functions and more

Create table(or report) to summarize headcounts/ information from multiple group variables BY ID.

Reply
Contributor
Posts: 25

Create table(or report) to summarize headcounts/ information from multiple group variables BY ID.

[ Edited ]

so I have a dataset I have people with more than one observation under the same id. 

And they are also grouped under different criteria, so there are different grouping variables, e.g.

e.g. 

id    var   group_1     Group_2

1      x        3              A1

1      z         3              A1

2      y         1             B2

3      8         2            A3

3      e         2            A3

 

How can I make a report or summaried table to show the unique count of frequency by ID, under each grouping variable, and breakdown by their group value. 

Like this

 

group     headcount

1               3

2               4

A1            1

A3            6

B2          ........

.....

 

Proc SQL only counts distinct value from a variable as a whole, I don't know how to condition it. Similarly, if I use Sum(case...when) I doesn't give me discticnt count by ID. Here, to be noted that I want not the unique count of a variable, but a unique count of a variable by ID. And I don't know how to do that in Proc freq. 

 

My last resort is to separate dataset, run proc sort to delete dups, and then run proc freq.. but that willl give me a ton of tables, and I need another proc to join the info(headcounts) I want in an addional table. 

 

 

Super User
Posts: 10,466

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

It will help if your desired output comes from the example input. It is very difficult to see how group A3 gets a count of 6.

 

I think you need to transpose the data so that you take

id    var   group_1     Group_2

1      x        3              A1

1      z         3             A1

 

And turn it into

id var group

1   x     3

1   x     A1

1   z     3

1   z     A1

 

and then count the unique or distinct ID for Group so Id=1 gets counted once in group 3 and once in group A1.

Does this describe what you're attempting?

Does the variable Var have any bearing on this process? It may be easier to ignore it for the purpose of counting.

 

 

Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

the thing is my grouping variables are not mutually exclusive.

 

e.g. A1 may include 1.2.3

etc..

Super User
Posts: 17,750

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

Please post a full example=> sample data with sample output that aligns with the sample data.

Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

[ Edited ]

Alright. 

 

Input:

 

ID                Major               Major College      Major Department        Primary Major              Math_major_type                  Major Type

 

John           Math                   Science                   math                        Math                     Primary with one outside                  Primary

John           Art                      Fine Art                      Art                         Math                      Primary with one outside                  Primary 

Kate           Math                    Science                   math                        Math                     Primary with only one                  Primary

Lex              Math                   Science                   math                        Math                     Primary with one within                  Primary

Lex              Statistics             Science                   statistics                  Math                     Primary with one within                  Primary

Joe             Math                     Science                   math                        Math                     Primary with only one                  Primary

Ash             Math                     Science                   math                        Math                      Primary with only one                  Primary

Jim             Sociology                Humanity             Social Science            Sociology                Not Primary                               Not Primary

Jim             Math                      Science                   math                        Math                     Not Primary                                   Not Primary    

Jim             Statistics              Science                   statistics                  Math                     Not Primary                                 Not Primary       

 

OUT:

 

Headcount                                     Unique Freq

Total                                                       6

Primary                                                  5

Primary with one outside                      1

Primary with one within                         1

Primary with only                                  3

Not primary                                           1

 

Major count by major department       

Total Majors.........................................10

Art ......................................................1                                                     

Statistics.............................................2

Math...................................................6

Social Science...................................1

 

PS: some definition

Primary  - math Primary major

Primary with only one  - has math as primary major and that is the only major they have (e.g. Kate, Joe, Ash)

Primary with one outside - who has math as primary major but also has at leasat one major outside the college of science(e.g. John)

Primary with one within- who has math as primary major but also has at leasat another one major within the college of science(e.g. Lex)

Not Primary - Does not have math as a primary major. (Jim)

 

 

Super User
Posts: 17,750

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

I think you want this:

 

It won't generate the total, but in my opinion, totals should be calculated in final report not stored in the dataset. 

 

https://gist.github.com/statgeek/e0903d269d4a71316a4e

Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

right... that's on on how to formatt the table afterwards... but I can use proc req because of the dups...

Super User
Posts: 10,466

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

Fo those things that are related to unique persons you may need to sort the data and then using first. or last. techniques assign a value, likely 1, to one record per person. Then sum those.

 

I think the bit related to the majors and primaries might be possibly using a multilabel formats, but that only works with a few procedures.

 

Please provide some example data in the form of datastep code if you want someone to test some code with your data.

I think I can do this with two tables in proc tabulate.

Super User
Posts: 9,662

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

This could give you some light .

 

data have;
input ID      $          Major : $20.              MajorCollege  & $40.    MajorDepartment    & $40.     PrimaryMajor    & $40.           Math_major_type  & $40.         MajorType  & $40.;
cards;
John           Math                   Science                   math                        Math                     Primary with one outside                  Primary
John           Art                      Fine Art                      Art                         Math                      Primary with one outside                  Primary 
Kate           Math                    Science                   math                        Math                     Primary with only one                  Primary
Lex              Math                   Science                   math                        Math                     Primary with one within                  Primary
Lex              Statistics             Science                   statistics                  Math                     Primary with one within                  Primary
Joe             Math                     Science                   math                        Math                     Primary with only one                  Primary
Ash             Math                     Science                   math                        Math                      Primary with only one                  Primary
Jim             Sociology                Humanity             Social Science            Sociology                Not Primary                               Not Primary
Jim             Math                      Science                   math                        Math                     Not Primary                                   Not Primary    
Jim             Statistics              Science                   statistics                  Math                     Not Primary                                 Not Primary  
;
run;
data temp;
set have;
length group $ 40;
array x{*} $ Major -- MajorType ;
do i=1 to dim(x);
 group=x{i};    
 output;
end;
drop i;
run;
proc sql;
create table want as
 select group,count(distinct id) as n
  from temp
   group by group
 union
 select 'Total',count(distinct id) as n
  from temp;
quit; 
Super User
Posts: 17,750

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

You could also consider proc tabulate or a proc freq. 

 

I'm not following how your numbers are calculated below - is the sample output intended to line up with the sample data?

 

proc tabulate data=sashelp.class;
class sex age;
table sex age, n;
run;

 

Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

the numbers are not calculated in line with the data, i'm just using it as an example. 

 

I don;t think Proc tabulate will resolve the issue with dup data though,

 

besides, I need to put the all output count in ONE table. 

 

Super User
Posts: 5,254

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

I think if you follow @ballardw you'll get what you want.
Data never sleeps
Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

did you read my reply?

 

my group variables are not mutually exclusive.

Super User
Posts: 17,750

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

Did you try the solution? 

 

As mentioned you need to post better data, as is, we're not sure what your question is, or why things aren't working for you.

Contributor
Posts: 25

Re: Create table(or report) to summarize headcounts/ information from multiple group variables BY ID

hold on.... I can only type this fast.... it's coming

Ask a Question
Discussion stats
  • 15 replies
  • 418 views
  • 0 likes
  • 5 in conversation