Solved: Counting number of different values in multiple subcategories

gsk · Posted 10-24-2019 12:16 PM

I want to count number of distinct values of subcategories/subgroups within a group. For example, I have dataset called data:

data data;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 ;
input Name $ School $ Degree $ ;
datalines;
David, Brown, CS
David, UC Davis,
David,UC Davis, Psych
Linda,UPenn, Statistics
Robert,U of T , Statistics
Robert, UCLA , Math
Sally, Northwestern, English
Sally, NYU, English
;

For this data, group is individual, and first subgroup is school, and second subgroup is major.

David attended two distinct schools Brown and UC Davis, out of 3 of his rows. He studied three distinct fields CS, blank (blank needs to be counted as its own subgroup), and Psych. I want to get his school_count to be 2 and major_count to be 3.

Eventually, I want a dataset like this:

data want;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 school_flag degree_flag 3.;
input Name $ School $ Degree $ school_flag degree_flag 3.;
datalines;
David, Brown, CS, 2, 3
David, UC Davis, , 2, 3
David,UC Davis, Psych, 2, 3
Linda,UPenn, Statistics, 1, 1
Robert,U of T , Statistics, 2, 2
Robert, UCLA , Math, 2, 2
Sally, Northwestern, English, 2, 1
Sally, NYU, English, 2, 1
;

Is there a way to do this using data steps or proc sql? Thank you!

novinosrin · Posted 10-24-2019 12:29 PM

Hi @gsk Pretty straight forward SQL

data data;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 ;
input Name $ School $ Degree $ ;
datalines;
David, Brown, CS
David, UC Davis,
David,UC Davis, Psych
Linda,UPenn, Statistics
Robert,U of T , Statistics
Robert, UCLA , Math
Sally, Northwestern, English
Sally, NYU, English
;

 
proc sql;
create table want as
select *,count(distinct School) as school_flag,count(distinct ifc(degree=' ','9999',degree)) as degree_flag 
from data
group by name;
quit;

View solution in original post

novinosrin · Posted 10-24-2019 12:29 PM

Hi @gsk Pretty straight forward SQL

data data;
infile datalines missover delimiter=',';
length Name $6 School $12 Degree $10 ;
input Name $ School $ Degree $ ;
datalines;
David, Brown, CS
David, UC Davis,
David,UC Davis, Psych
Linda,UPenn, Statistics
Robert,U of T , Statistics
Robert, UCLA , Math
Sally, Northwestern, English
Sally, NYU, English
;

 
proc sql;
create table want as
select *,count(distinct School) as school_flag,count(distinct ifc(degree=' ','9999',degree)) as degree_flag 
from data
group by name;
quit;

gsk · Posted 10-25-2019 08:05 AM

Thank you!! I looked up the syntax but still don't understand this part

count(distinct ifc(degree=' ','9999',degree)) as degree_flag

Could you please explain what distinct ifc(degree=' ','9999',degree) mean..?

novinosrin · Posted 10-25-2019 08:10 AM

Since you mentioned Missing as valid count,Basically I am resetting missing to a non missing junk value before I count distinct using IFC function.

Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Re: Counting number of different values in multiple subcategories

Registration is open