Solved: Cross-tab frequency table based on number of multiple responses

yellowyellowred · Posted 02-17-2022 08:07 PM

I have a data set

data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;

Some group_id variables have multiple survey_id responses. I want the following table:

Survey_id	1	2	3	4	5
1	1	2	0	2	1
2		2	0	2	1
3			0	0	0
4				2	1
5					1

I didn't fill the bottom half, since it'll be the same as the upper half.
Basically, the table should be only for those with multiple group_id's (so I ignore the group_id 222 and 444). Then, it counts the number of group_id's in each cell (e.g. the cell with Survey_id = 1x4 counts the number of group_id (where there are duplicates of the group_id's) that have both survey_id = 1 and 4).
Can someone show me how to do this easily? Thanks

Ksharp · Posted 02-18-2022 07:12 AM

data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;

proc sql;
create table temp as
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
 from (select * from have group by group_id having count(*)>1) as a,
      (select * from have group by group_id having count(*)>1) as b
  where a.group_id=b.group_id and a.survey_id>=b.survey_id ;

create table want as
select a_survey_id,b_survey_id,sum(count) as sum
from
(
select c.*,coalesce(d.count,0) as count
from
(
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
 from (select distinct survey_id from have) as a,(select distinct survey_id from have) as b
) as c natural left join (select *,1 as count from temp) as d
)
group by a_survey_id,b_survey_id
order by b_survey_id,a_survey_id
;
quit;

proc transpose data=want out=final_want ;
by b_survey_id;
id a_survey_id;
var sum;
run;

View solution in original post

ballardw · Posted 02-18-2022 04:33 AM

Now describe just how any of those cells in the "want" are derived.

Computer code uses rules and we need to know the rules. Examples help but the rules are needed.

If I provide the following example:

Input Output

1 3

2 3

3 5

4 4

can you tell me what the result for 13 or 444 much less infinity would be? (8 and 23 )

Much less infinity?

Ksharp · Posted 02-18-2022 07:12 AM

data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;

proc sql;
create table temp as
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
 from (select * from have group by group_id having count(*)>1) as a,
      (select * from have group by group_id having count(*)>1) as b
  where a.group_id=b.group_id and a.survey_id>=b.survey_id ;

create table want as
select a_survey_id,b_survey_id,sum(count) as sum
from
(
select c.*,coalesce(d.count,0) as count
from
(
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
 from (select distinct survey_id from have) as a,(select distinct survey_id from have) as b
) as c natural left join (select *,1 as count from temp) as d
)
group by a_survey_id,b_survey_id
order by b_survey_id,a_survey_id
;
quit;

proc transpose data=want out=final_want ;
by b_survey_id;
id a_survey_id;
var sum;
run;

yellowyellowred · Posted 02-23-2022 05:05 PM

Hi,

Thanks. It works perfectly. I was wondering what the difference betwen a natural left join and a left join is? I searched online but can't figure it out.

Ksharp · Posted 02-24-2022 06:52 AM

They are the same. NATURAL will automatically find the same variables between left and right table and use these variables in common as the KEY variables to JOIN . NATURAL could save you some words ,
it is the same as " LEFT JOIN ON a.a_survey_id=b.a_survey_id and a.b_survey_id=b.b_survey_id "

Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

Re: Cross-tab frequency table based on number of multiple responses

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away