Solved: proc sql summary table considering distints subjids

michtka · Posted 06-15-2012 06:32 AM

Hi guys, using the sql code below suggested by Hai.Kuo, I obtained this table:

Final table:

race male female total

black 2 1 3

latin 2 1 3

oriental 1 0 1

but now , considering a new imput of data, there are a subject that is duplicate, and I would like using the same sql but considering only distints subjects, then latin male has to be 1, instead of 2, and total has to be 2, then the table I want is:

race male female total

black 2 1 3

latin 1 1 2

oriental 1 0 1

Can you help me with this? Thanks. V.

****************************************************

*New input with a duplicate redord (subno=3);

data have;

length subno 4 race $10 sex $10;

input subno race sex ;

datalines;

1 black male

2 black female

3 latin male

4 black male

5 latin female

6 oriental male

;

run;

*code sugggested by code Hai.Kuo for an old input:

proc sql;

select race, sum(sex='male') as male, sum(sex='female') as female, count(race) as total

from have

group by race;

quit;

Alpay · Posted 06-15-2012 12:45 PM

It will get you the unique rows for variables specified in select clause. It is similar to 'group by' but no summary statistics computed.

select distinct race, subno, sex from have

the result of this query:

1 black male

2 black female

3 latin male

4 black male

5 latin female

6 oriental male

Using the result of this sub-query, you can get the counts by race.

View solution in original post

Alpay · Posted 06-15-2012 07:05 AM

You will need to get distinct values of race, subno and sex in a subquery and then sum it up.

proc sql;

select race, sum(sex='male') as male, sum(sex='female') as female, count(race) as total

from (select distinct race, subno, sex from have)

group by race;

quit;

michtka · Posted 06-15-2012 10:39 AM

Hello alpay, thanks.

Please, could you explain the the role of distinct in that subquery?...i.g...which variable is affected for this...race?

I was thinking of something like:

from (select distinct subno, race, sex)

instead of your line of code.

I will aprreciate you can explain to me the difference.

Thanks.

Alpay · Posted 06-15-2012 12:45 PM

It will get you the unique rows for variables specified in select clause. It is similar to 'group by' but no summary statistics computed.

select distinct race, subno, sex from have

the result of this query:

1 black male

2 black female

3 latin male

4 black male

5 latin female

6 oriental male

Using the result of this sub-query, you can get the counts by race.

proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Re: proc sql summary table considering distints subjids

Registration is open

SAS Training: Just a Click Away