Hi All,
I am working with a dataset with the following multilevel layout.
agencyid | unitid | itemnumber | answer |
1 | 2 | 201 | YES |
1 | 2 | 202 | YES |
1 | 2 | 203 | YES |
1 | 2 | 204 | YES |
1 | 3 | 201 | YES |
1 | 3 | 202 | YES |
1 | 3 | 203 | YES |
1 | 3 | 204 | YES |
1 | 4 | 201 | YES |
1 | 4 | 202 | NO |
1 | 4 | 203 | YES |
1 | 4 | 204 | YES |
The agency id ranges from 1 to 42. There are several unitids within an agencyid. Each unitid will answer itemnumbers within a question. I need the number of units within an agency that have all 'yes' answers to all the itemnumbers 201 to 204.
For eg here the count of unique unitids that have a 'yes' response to all the itemnumbers is 2.
How can I improvise my code below to account for 'yes' responses to all the itemnumbers 201 to 204 ? The below code only gives me counts of unitids for any one of the itemnumbers with a yes answer.
proc sql;
create table c as
select agencyid,
count(distinct unitid)
as numerator
from test /* name of dataset*/
where answer='Yes' and itemnumber in (201,202,203,204)
group by agencyid;
quit;
Please advice. Thanks !
You need a subquery :
proc sql;
create table want as
select agencyId, count(unitId) as nbUnits
from
(select agencyId, unitId
from test
where answer = "YES" and itemNumber in (201, 202, 203, 204)
group by agencyId, unitId
having count(distinct itemNumber)=4)
group by agencyId;
quit;
PG
You need a subquery :
proc sql;
create table want as
select agencyId, count(unitId) as nbUnits
from
(select agencyId, unitId
from test
where answer = "YES" and itemNumber in (201, 202, 203, 204)
group by agencyId, unitId
having count(distinct itemNumber)=4)
group by agencyId;
quit;
PG
proc sql;
create table c as
select agencyid,unitid from have
group by agencyid,unitid
having sum(answer='YES')=4;
quit;
Hi PGStats and stat@sas,
Thanks for your help, but I am still not getting the right answer...
The results table based on my example has to look like below
agencyid | Unitid count |
1 | 2 |
2 | 1 |
3 | 6 |
I used both of your syntax and came up with this improvisation to my code. However, I get 0 rows and 2 columns . I not sure whats wrong ? Please let me know.
proc sql;
create table c as
select agencyid,
count(distinct unitid)
as numerator
from test
where itemid in (201,202,203,204)
group by agencyid
having sum(answer='Yes')=4;
quit;
When I use my code with your example data (which covers agencyId=1 only) I get :
agencyid nbUnits
------------------
1 2
In your new query, you group on agencyId and ask for the group condition sum(answer='Yes')=4 . You are thus selecting agencies (not units) with 4 YES answers. That's why you need a subquery. The lowest (subquery) level groups on agency and units to extract the units with 4 YES answers. The higher level counts the selected units within each agency.
PG
Thanks PG for the explanation. That makes sense. Let me give it a try and I shall get back.
Thanks PG ! It worked. I had an error in my syntax ! I broke your code down and ran it step wise.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.