Hi All,
I am working with a dataset with the following multilevel layout.
agencyid | unitid | itemnumber | answer |
1 | 2 | 201 | YES |
1 | 2 | 202 | YES |
1 | 2 | 203 | YES |
1 | 2 | 204 | YES |
1 | 3 | 201 | YES |
1 | 3 | 202 | YES |
1 | 3 | 203 | YES |
1 | 3 | 204 | YES |
1 | 4 | 201 | YES |
1 | 4 | 202 | NO |
1 | 4 | 203 | YES |
1 | 4 | 204 | YES |
The agency id ranges from 1 to 42. There are several unitids within an agencyid. Each unitid will answer itemnumbers within a question. I need the number of units within an agency that have all 'yes' answers to all the itemnumbers 201 to 204.
For eg here the count of unique unitids that have a 'yes' response to all the itemnumbers is 2.
How can I improvise my code below to account for 'yes' responses to all the itemnumbers 201 to 204 ? The below code only gives me counts of unitids for any one of the itemnumbers with a yes answer.
proc sql;
create table c as
select agencyid,
count(distinct unitid)
as numerator
from test /* name of dataset*/
where answer='Yes' and itemnumber in (201,202,203,204)
group by agencyid;
quit;
Please advice. Thanks !
You need a subquery :
proc sql;
create table want as
select agencyId, count(unitId) as nbUnits
from
(select agencyId, unitId
from test
where answer = "YES" and itemNumber in (201, 202, 203, 204)
group by agencyId, unitId
having count(distinct itemNumber)=4)
group by agencyId;
quit;
PG
You need a subquery :
proc sql;
create table want as
select agencyId, count(unitId) as nbUnits
from
(select agencyId, unitId
from test
where answer = "YES" and itemNumber in (201, 202, 203, 204)
group by agencyId, unitId
having count(distinct itemNumber)=4)
group by agencyId;
quit;
PG
proc sql;
create table c as
select agencyid,unitid from have
group by agencyid,unitid
having sum(answer='YES')=4;
quit;
Hi PGStats and stat@sas,
Thanks for your help, but I am still not getting the right answer...
The results table based on my example has to look like below
agencyid | Unitid count |
1 | 2 |
2 | 1 |
3 | 6 |
I used both of your syntax and came up with this improvisation to my code. However, I get 0 rows and 2 columns . I not sure whats wrong ? Please let me know.
proc sql;
create table c as
select agencyid,
count(distinct unitid)
as numerator
from test
where itemid in (201,202,203,204)
group by agencyid
having sum(answer='Yes')=4;
quit;
When I use my code with your example data (which covers agencyId=1 only) I get :
agencyid nbUnits
------------------
1 2
In your new query, you group on agencyId and ask for the group condition sum(answer='Yes')=4 . You are thus selecting agencies (not units) with 4 YES answers. That's why you need a subquery. The lowest (subquery) level groups on agency and units to extract the units with 4 YES answers. The higher level counts the selected units within each agency.
PG
Thanks PG for the explanation. That makes sense. Let me give it a try and I shall get back.
Thanks PG ! It worked. I had an error in my syntax ! I broke your code down and ran it step wise.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.