Help using Base SAS procedures

Proc Sql - Summarize data by two variables

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

Proc Sql - Summarize data by two variables

I want to figure out how to add up the office_yes indicator variable in two ways, then tack it onto the end of each line.

     1. Per Visit

     2. Per ID (only summing one 'yes' indicator per visit)

I've tried a several combinations of CASE/WHEN, SUM, COUNT and COUNT(DISTINCT(id), as well as other things like sum(distinct(count(id)))," which gives the message "ERROR: Summary functions nested in this way are not supported." 

Otherwise, this is as far as I've gotten:

proc sql;   create table want as

  select a.* ,

  sum(office_yes_by_enc) as total_visits_per_id

  from (select

   visit_number,

   CASE WHEN sum(office_yes)>=1 then 1 ELSE 0 end as office_yes_by_enc,

   id,

   office_yes

  from have  group id, visit_number)  as a

   group id

;quit;

I know there is a straightforward solution with PROC SORT and merge, but there are millions of lines and I'd like to figure out how to do it with SQL.

And is this possible to do one SQL procedure, or does it need to be broken up?

Thanks!

Have:

idvisit_numberoffice_yes
8591
859.
810.
819.
2511
2701
245.
2451
2451
235.
2621
5871
5661

Want:

idvisit_numberoffice_yesoffice_yes_by_visittotal_office_visits_by_id
859111
859.11
810.01
819.01
251114
270114
245.14
245114
24511

4

235.04
262114
587112
566112

Accepted Solutions
Solution
‎11-04-2014 06:22 AM
Super User
Posts: 9,671

Re: Proc Sql - Summarize data by two variables

So you want count office_yes  for unique combination of id  and  visit_number  ?

data have;
input id     visit_number     office_yes ;
cards;
8     59     1
8     59     .
8     10     .
8     19     .
2     51     1
2     70     1
2     45     .
2     45     1
2     45     1
2     35     .
2     62     1
5     87     1
5     66     1
;
run;
proc sql;
create table want as 
select *,coalesce((select distinct office_yes from have where id=a.id and visit_number=a.visit_number and office_yes is not missing),0) as office_yes_by_visit,
        (select sum(office_yes) from (select distinct * from have where id=a.id)) as total_office_visits_by_id
 from have as a;
quit;


Xia Keshan

View solution in original post


All Replies
Trusted Advisor
Posts: 1,204

Re: Proc Sql - Summarize data by two variables

Try this.

proc sql;

create table want as

select a.*,coalesce(c.office_yes_by_visit,0) as office_yes_by_visit,b.total_office_visits_by_id from have a

left join (select id,count(distinct visit_number) as total_office_visits_by_id from have

where office_yes=1

group by id) b

on a.id=b.id

left join

(select id,visit_number,count(distinct visit_number) as office_yes_by_visit from have

where office_yes=1

group by id,visit_number) c

on a.id=c.id

and a.visit_number=c.visit_number;

quit;

Super User
Posts: 9,671

Re: Proc Sql - Summarize data by two variables

Sure.

data have;
input id     visit_number     office_yes ;
cards;
8     59     1
8     59     .
8     10     .
8     19     .
2     51     1
2     70     1
2     45     .
2     45     1
2     45     1
2     35     .
2     62     1
5     87     1
5     66     1
;
run;
proc sql;
create table want as 
select *,coalesce((select distinct office_yes from have where id=a.id and visit_number=a.visit_number and office_yes is not missing),0) as office_yes_by_visit,sum(office_yes) as total_office_visits_by_id
 from have as a
  group by id ;
quit;

Xia Keshan

Contributor
Posts: 26

Re: Proc Sql - Summarize data by two variables

Xia,

Thanks for the reply!  It looks like this code will sum up all of the 'yes' indicators per visit, instead of just one, which was the same problem I kept having too.

So, the output for ID #2 should be 4 for total_office_visits_by_id, but with the code above, it comes out to be 5.

M

Solution
‎11-04-2014 06:22 AM
Super User
Posts: 9,671

Re: Proc Sql - Summarize data by two variables

So you want count office_yes  for unique combination of id  and  visit_number  ?

data have;
input id     visit_number     office_yes ;
cards;
8     59     1
8     59     .
8     10     .
8     19     .
2     51     1
2     70     1
2     45     .
2     45     1
2     45     1
2     35     .
2     62     1
5     87     1
5     66     1
;
run;
proc sql;
create table want as 
select *,coalesce((select distinct office_yes from have where id=a.id and visit_number=a.visit_number and office_yes is not missing),0) as office_yes_by_visit,
        (select sum(office_yes) from (select distinct * from have where id=a.id)) as total_office_visits_by_id
 from have as a;
quit;


Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 622 views
  • 4 likes
  • 3 in conversation