I want to count number of distinct values of subcategories/subgroups within a group. For example, I have dataset called data:
data data;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 ;
input Name $ School $ Degree $ ;
datalines;
David, Brown, CS
David, UC Davis,
David,UC Davis, Psych
Linda,UPenn, Statistics
Robert,U of T , Statistics
Robert, UCLA , Math
Sally, Northwestern, English
Sally, NYU, English
;
For this data, group is individual, and first subgroup is school, and second subgroup is major.
David attended two distinct schools Brown and UC Davis, out of 3 of his rows. He studied three distinct fields CS, blank (blank needs to be counted as its own subgroup), and Psych. I want to get his school_count to be 2 and major_count to be 3.
Eventually, I want a dataset like this:
data want;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 school_flag degree_flag 3.;
input Name $ School $ Degree $ school_flag degree_flag 3.;
datalines;
David, Brown, CS, 2, 3
David, UC Davis, , 2, 3
David,UC Davis, Psych, 2, 3
Linda,UPenn, Statistics, 1, 1
Robert,U of T , Statistics, 2, 2
Robert, UCLA , Math, 2, 2
Sally, Northwestern, English, 2, 1
Sally, NYU, English, 2, 1
;
Is there a way to do this using data steps or proc sql? Thank you!