so I have a dataset I have people with more than one observation under the same id.
And they are also grouped under different criteria, so there are different grouping variables, e.g.
e.g.
id var group_1 Group_2
1 x 3 A1
1 z 3 A1
2 y 1 B2
3 8 2 A3
3 e 2 A3
How can I make a report or summaried table to show the unique count of frequency by ID, under each grouping variable, and breakdown by their group value.
Like this
group headcount
1 3
2 4
A1 1
A3 6
B2 ........
.....
Proc SQL only counts distinct value from a variable as a whole, I don't know how to condition it. Similarly, if I use Sum(case...when) I doesn't give me discticnt count by ID. Here, to be noted that I want not the unique count of a variable, but a unique count of a variable by ID. And I don't know how to do that in Proc freq.
My last resort is to separate dataset, run proc sort to delete dups, and then run proc freq.. but that willl give me a ton of tables, and I need another proc to join the info(headcounts) I want in an addional table.
It will help if your desired output comes from the example input. It is very difficult to see how group A3 gets a count of 6.
I think you need to transpose the data so that you take
id var group_1 Group_2
1 x 3 A1
1 z 3 A1
And turn it into
id var group
1 x 3
1 x A1
1 z 3
1 z A1
and then count the unique or distinct ID for Group so Id=1 gets counted once in group 3 and once in group A1.
Does this describe what you're attempting?
Does the variable Var have any bearing on this process? It may be easier to ignore it for the purpose of counting.
the thing is my grouping variables are not mutually exclusive.
e.g. A1 may include 1.2.3
etc..
Please post a full example=> sample data with sample output that aligns with the sample data.
Alright.
Input:
ID Major Major College Major Department Primary Major Math_major_type Major Type
John Math Science math Math Primary with one outside Primary
John Art Fine Art Art Math Primary with one outside Primary
Kate Math Science math Math Primary with only one Primary
Lex Math Science math Math Primary with one within Primary
Lex Statistics Science statistics Math Primary with one within Primary
Joe Math Science math Math Primary with only one Primary
Ash Math Science math Math Primary with only one Primary
Jim Sociology Humanity Social Science Sociology Not Primary Not Primary
Jim Math Science math Math Not Primary Not Primary
Jim Statistics Science statistics Math Not Primary Not Primary
OUT:
Headcount Unique Freq
Total 6
Primary 5
Primary with one outside 1
Primary with one within 1
Primary with only 3
Not primary 1
Major count by major department
Total Majors.........................................10
Art ......................................................1
Statistics.............................................2
Math...................................................6
Social Science...................................1
PS: some definition
Primary - math Primary major
Primary with only one - has math as primary major and that is the only major they have (e.g. Kate, Joe, Ash)
Primary with one outside - who has math as primary major but also has at leasat one major outside the college of science(e.g. John)
Primary with one within- who has math as primary major but also has at leasat another one major within the college of science(e.g. Lex)
Not Primary - Does not have math as a primary major. (Jim)
I think you want this:
It won't generate the total, but in my opinion, totals should be calculated in final report not stored in the dataset.
right... that's on on how to formatt the table afterwards... but I can use proc req because of the dups...
Fo those things that are related to unique persons you may need to sort the data and then using first. or last. techniques assign a value, likely 1, to one record per person. Then sum those.
I think the bit related to the majors and primaries might be possibly using a multilabel formats, but that only works with a few procedures.
Please provide some example data in the form of datastep code if you want someone to test some code with your data.
I think I can do this with two tables in proc tabulate.
This could give you some light .
data have;
input ID $ Major : $20. MajorCollege & $40. MajorDepartment & $40. PrimaryMajor & $40. Math_major_type & $40. MajorType & $40.;
cards;
John Math Science math Math Primary with one outside Primary
John Art Fine Art Art Math Primary with one outside Primary
Kate Math Science math Math Primary with only one Primary
Lex Math Science math Math Primary with one within Primary
Lex Statistics Science statistics Math Primary with one within Primary
Joe Math Science math Math Primary with only one Primary
Ash Math Science math Math Primary with only one Primary
Jim Sociology Humanity Social Science Sociology Not Primary Not Primary
Jim Math Science math Math Not Primary Not Primary
Jim Statistics Science statistics Math Not Primary Not Primary
;
run;
data temp;
set have;
length group $ 40;
array x{*} $ Major -- MajorType ;
do i=1 to dim(x);
group=x{i};
output;
end;
drop i;
run;
proc sql;
create table want as
select group,count(distinct id) as n
from temp
group by group
union
select 'Total',count(distinct id) as n
from temp;
quit;
You could also consider proc tabulate or a proc freq.
I'm not following how your numbers are calculated below - is the sample output intended to line up with the sample data?
proc tabulate data=sashelp.class;
class sex age;
table sex age, n;
run;
the numbers are not calculated in line with the data, i'm just using it as an example.
I don;t think Proc tabulate will resolve the issue with dup data though,
besides, I need to put the all output count in ONE table.
did you read my reply?
my group variables are not mutually exclusive.
Did you try the solution?
As mentioned you need to post better data, as is, we're not sure what your question is, or why things aren't working for you.
hold on.... I can only type this fast.... it's coming
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.