I have a data set
data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;
Some group_id variables have multiple survey_id responses. I want the following table:
Survey_id | 1 | 2 | 3 | 4 | 5 |
1 | 1 | 2 | 0 | 2 | 1 |
2 | 2 | 0 | 2 | 1 | |
3 | 0 | 0 | 0 | ||
4 | 2 | 1 | |||
5 | 1 |
I didn't fill the bottom half, since it'll be the same as the upper half.
Basically, the table should be only for those with multiple group_id's (so I ignore the group_id 222 and 444). Then, it counts the number of group_id's in each cell (e.g. the cell with Survey_id = 1x4 counts the number of group_id (where there are duplicates of the group_id's) that have both survey_id = 1 and 4).
Can someone show me how to do this easily? Thanks
data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;
proc sql;
create table temp as
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
from (select * from have group by group_id having count(*)>1) as a,
(select * from have group by group_id having count(*)>1) as b
where a.group_id=b.group_id and a.survey_id>=b.survey_id ;
create table want as
select a_survey_id,b_survey_id,sum(count) as sum
from
(
select c.*,coalesce(d.count,0) as count
from
(
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
from (select distinct survey_id from have) as a,(select distinct survey_id from have) as b
) as c natural left join (select *,1 as count from temp) as d
)
group by a_survey_id,b_survey_id
order by b_survey_id,a_survey_id
;
quit;
proc transpose data=want out=final_want ;
by b_survey_id;
id a_survey_id;
var sum;
run;
Now describe just how any of those cells in the "want" are derived.
Computer code uses rules and we need to know the rules. Examples help but the rules are needed.
If I provide the following example:
Input Output
1 3
2 3
3 5
4 4
can you tell me what the result for 13 or 444 much less infinity would be? (8 and 23 )
Much less infinity?
data have;
input group_id survey_id;
datalines;
111 1
111 4
111 2
222 3
333 4
333 5
333 2
333 1
444 2
;
proc sql;
create table temp as
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
from (select * from have group by group_id having count(*)>1) as a,
(select * from have group by group_id having count(*)>1) as b
where a.group_id=b.group_id and a.survey_id>=b.survey_id ;
create table want as
select a_survey_id,b_survey_id,sum(count) as sum
from
(
select c.*,coalesce(d.count,0) as count
from
(
select a.survey_id as a_survey_id,b.survey_id as b_survey_id
from (select distinct survey_id from have) as a,(select distinct survey_id from have) as b
) as c natural left join (select *,1 as count from temp) as d
)
group by a_survey_id,b_survey_id
order by b_survey_id,a_survey_id
;
quit;
proc transpose data=want out=final_want ;
by b_survey_id;
id a_survey_id;
var sum;
run;
Hi,
Thanks. It works perfectly. I was wondering what the difference betwen a natural left join and a left join is? I searched online but can't figure it out.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.