Hi all,
I have a data set with ID_NO , TRANSACTION_DATE , PRODUCT, NAME and STATE the By GROUP variables are ID_NO & TRANS_DATE
Although I would expect NAME and STATE to be the same across each BY GROUP, this is not the case.
ID_NO | TRANS_DATE | PRODUCTS | NAME | STATE |
255 | 3/2/2012 | BED | JOHN | NY |
255 | 3/2/2012 | PILLOW | JOHN | NY |
255 | 3/2/2012 | MATTRES | JOHN | NY |
468 | 9/12/2012 | BED | MIKE | California |
468 | 9/12/2012 | PILLOW | MIKE | MARYLAND |
468 | 9/12/2012 | MATTRES | MIKE | California |
137 | 6/25/2012 | BED | JIM | Michigan |
137 | 6/25/2012 | PILLOW | JIM | MARYLAND |
137 | 6/25/2012 | MATTRES | TOM | Michigan |
137 | 6/25/2012 | MATTRES | JIM | Michigan |
I would like to run a PROC SQL that would identify these BY GROUP that deviate and at the same time to provide percentages of how many of these BY GROUP are diffferent
a) in NAME
b) in STATE
Any suggestions would be much appreciated.
Kind regBY GROUP rards
Nikos
Since you didn't post what output you need.
So it is hard to make sure what your intention is .
data have; input ID_NO TRANS_DATE :mmddyy10. PRODUCTS $ NAME $ STATE $; datalines; 255 3/2/2012 BED JOHN NY 255 3/2/2012 PILLOW JOHN NY 255 3/2/2012 MATTRES JOHN NY 468 9/12/2012 BED MIKE California 468 9/12/2012 PILLOW MIKE MARYLAND 468 9/12/2012 MATTRES MIKE California 137 6/25/2012 BED JIM Michigan 137 6/25/2012 PILLOW JIM MARYLAND 137 6/25/2012 MATTRES TOM Michigan 137 6/25/2012 MATTRES JIM Michigan ; run; proc sql; title "Pct deviate name"; select b.*, n_name/n as pctName format=percent8.2 from (select id_no,trans_date,count(*) as n from have group by id_no,trans_date) as a, (select id_no,trans_date,name,count(*) as n_name from have group by id_no, trans_date,name) as b where a.id_no=b.id_no and a.trans_date=b.trans_date ; title "Pct deviate state"; select b.*, n_STATE/n as pctSTATE format=percent8.2 from (select id_no,trans_date,count(*) as n from have group by id_no,trans_date) as a, (select id_no,trans_date,STATE,count(*) as n_STATE from have group by id_no, trans_date,STATE) as b where a.id_no=b.id_no and a.trans_date=b.trans_date ; quit;
Ksharp
I suggest :
data have;
input ID_NO TRANS_DATE :mmddyy10. PRODUCTS $ NAME $ STATE $;
datalines;
255 3/2/2012 BED JOHN NY
255 3/2/2012 PILLOW JOHN NY
255 3/2/2012 MATTRES JOHN NY
468 9/12/2012 BED MIKE California
468 9/12/2012 PILLOW MIKE MARYLAND
468 9/12/2012 MATTRES MIKE California
137 6/25/2012 BED JIM Michigan
137 6/25/2012 PILLOW JIM MARYLAND
137 6/25/2012 MATTRES TOM Michigan
137 6/25/2012 MATTRES JIM Michigan
;
proc sql;
title "Pct deviate name";
select sum(n>1)/count(n) as pctName format=percent8.2 from
(select count(distinct name) as n from have group by id_no, trans_date);
title "Pct deviate state";
select sum(n>1)/count(n) as pctState format=percent8.2 from
(select count(distinct state) as n from have group by id_no, trans_date);
quit;
Add UPCASE() function to name or state if you do not care about differences in case.
PG
proc sql;
select a.id_no,
a.trans_date,
case
when count(distinct a.name) > 1 then 1
else 0
end as name_deviate,
1 - max(b.name_count) / count(*) as name_deviate_percent format=percent.,
case
when count(distinct a.state) > 1 then 1
else 0
end as state_deviate,
1 - max(c.state_count) / count(*) as state_deviate_percent format=percent.
from foo a
left outer join ( select id_no,trans_date,name,count(*) as name_count from foo group by id_no,trans_date,name ) b on a.id_no=b.id_no and a.trans_date=b.trans_date and a.name=b.name
left outer join ( select id_no,trans_date,state,count(*) as state_count from foo group by id_no,trans_date,state ) c on a.id_no=c.id_no and a.trans_date=c.trans_date and a.state=c.state
group by a.id_no,a.trans_date;
quit;
id_no | trans_date | name_deviate | name_deviate_percent | state_deviate | state_deviate_percent |
137 | 25JUN2012 | 1 | 25% | 1 | 25% |
255 | 02MAR2012 | 0 | 0% | 0 | 0% |
468 | 12SEP2012 | 0 | 0% | 1 | 33% |
Since you didn't post what output you need.
So it is hard to make sure what your intention is .
data have; input ID_NO TRANS_DATE :mmddyy10. PRODUCTS $ NAME $ STATE $; datalines; 255 3/2/2012 BED JOHN NY 255 3/2/2012 PILLOW JOHN NY 255 3/2/2012 MATTRES JOHN NY 468 9/12/2012 BED MIKE California 468 9/12/2012 PILLOW MIKE MARYLAND 468 9/12/2012 MATTRES MIKE California 137 6/25/2012 BED JIM Michigan 137 6/25/2012 PILLOW JIM MARYLAND 137 6/25/2012 MATTRES TOM Michigan 137 6/25/2012 MATTRES JIM Michigan ; run; proc sql; title "Pct deviate name"; select b.*, n_name/n as pctName format=percent8.2 from (select id_no,trans_date,count(*) as n from have group by id_no,trans_date) as a, (select id_no,trans_date,name,count(*) as n_name from have group by id_no, trans_date,name) as b where a.id_no=b.id_no and a.trans_date=b.trans_date ; title "Pct deviate state"; select b.*, n_STATE/n as pctSTATE format=percent8.2 from (select id_no,trans_date,count(*) as n from have group by id_no,trans_date) as a, (select id_no,trans_date,STATE,count(*) as n_STATE from have group by id_no, trans_date,STATE) as b where a.id_no=b.id_no and a.trans_date=b.trans_date ; quit;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.