I want to count number of distinct values of subcategories/subgroups within a group. For example, I have dataset called data: data data; infile datalines missover delimiter=','; length Name $6 School $12 Degree $10 ; input Name $ School $ Degree $ ; datalines; David, Brown, CS David, UC Davis, David,UC Davis, Psych Linda,UPenn, Statistics Robert,U of T , Statistics Robert, UCLA , Math Sally, Northwestern, English Sally, NYU, English ; For this data, group is individual, and first subgroup is school, and second subgroup is major. David attended two distinct schools Brown and UC Davis, out of 3 of his rows. He studied three distinct fields CS, blank (blank needs to be counted as its own subgroup), and Psych. I want to get his school_count to be 2 and major_count to be 3. Eventually, I want a dataset like this: data want; infile datalines missover delimiter=','; length Name $6 School $12 Degree $10 school_flag degree_flag 3.; input Name $ School $ Degree $ school_flag degree_flag 3.; datalines; David, Brown, CS, 2, 3 David, UC Davis, , 2, 3 David,UC Davis, Psych, 2, 3 Linda,UPenn, Statistics, 1, 1 Robert,U of T , Statistics, 2, 2 Robert, UCLA , Math, 2, 2 Sally, Northwestern, English, 2, 1 Sally, NYU, English, 2, 1 ; Is there a way to do this using data steps or proc sql? Thank you!
... View more