selecting categories of a variable for which count of category of another variables is greater than

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

selecting categories of a variable for which count of category of another variables is greater than

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

Hello,

 I want to identify zipcodes for which smoke 1 count N > smoke 0 count N. So from above: for zip code 7, smoke 1=77 which is greater than smoke 0=11. Therefore I'll select zip code 7 in output  but not other zip codes.

I want to know SAS code to get output like this from the above data:

Zipcode

7

note: I have to do this for hundred of observations.

 

Thanking you.

 

runn.


Accepted Solutions
Solution
‎04-23-2018 04:50 PM
PROC Star
Posts: 1,784

Re: selecting categories of a variable for which count of category of another variables is greater t

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;

View solution in original post


All Replies
Super User
Super User
Posts: 9,599

Re: selecting categories of a variable for which count of category of another variables is greater t

[ Edited ]

What happens in the other case, or if there isn't one?  Anyways:

proc sql;
  create table want as
  select  a.*
  from    (select * from have where smoke=1) a
  left join (select * from have where smoke=0) b
  on        a.zipcode=b.zipcode
  having a.n > b.n;
quit;

Or you could proc transpose them:

proc sort data=have;
  by zip_code smoke;
run;
proc transpose data=have out=want prefix=smoke;
  by zip_code;
  var n;
  id smoke;
run;
data want;
  set want;
  where smoke1 > smoke0;
run;

Or even a diff;

data want;
  set have;
  if lag(zip_code)=zip_code and diff(smoke)>0;
run;

 Note, not tested, provide test data in the form of a datastep.

Super User
Posts: 23,700

Re: selecting categories of a variable for which count of category of another variables is greater t

 

Some basic algebra comes in handy here. I'll leave the math to you, but I think that if the mean value of smoke is greater than 50% it means that the Sum(1)>Sum(0).   

 

proc sql;
create table want as
select zipcode, mean(smoke) as pct_smoke
from have
group by zipcode
having pct_smoke > 0.5;
quit;
Super User
Posts: 6,762

Re: selecting categories of a variable for which count of category of another variables is greater t

Assuming your data is in order by ZIPCODE, a DATA step can handle this:

 

data want;

set have;

by zipcode;

if first.zipcode then total=0;

total + N * (smoke - (smoke=0));

if last.zipcode and total > 0;

run;

PROC Star
Posts: 1,784

Re: selecting categories of a variable for which count of category of another variables is greater t

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;

proc sql;
create table want as
select *
from have
group by zipcode
having max(n)=n and smoke=1;
quit;


Super User
Posts: 10,770

Re: selecting categories of a variable for which count of category of another variables is greater t

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
data want;
 set have;
 if zipCODE=lag(zipCODE) and n>lag(n);
 keep zipCODE;
run;
Occasional Contributor
Posts: 5

Re: selecting categories of a variable for which count of category of another variables is greater t

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

want:

Zipcode %smoker

1              (40/60)*100=66.67%

5               0

7               7%

and so on for the rest of the observations in the data.

how should I code?

thanking you 

 

Runn

Solution
‎04-23-2018 04:50 PM
PROC Star
Posts: 1,784

Re: selecting categories of a variable for which count of category of another variables is greater t

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;
Super User
Posts: 23,700

Re: selecting categories of a variable for which count of category of another variables is greater t

That's what my code did. And if it's over 50% that's the area's you wanted.

 


@runn wrote:

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

want:

Zipcode %smoker

1              (40/60)*100=66.67%

5               0

7               7%

and so on for the rest of the observations in the data.

how should I code?

thanking you 

 

Runn


 

Occasional Contributor
Posts: 5

Re: selecting categories of a variable for which count of category of another variables is greater t

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have

Super User
Posts: 23,700

Re: selecting categories of a variable for which count of category of another variables is greater t


@runn wrote:

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have


No idea what that means. You want the percent of 1, in mathematical terms that's the SUM(N=1)/ Sum (N=1) + SUM(N=0) => using some math that works out to the same average value of 0/1. Make sure to remove the HAVING clause which filters out the results to include only percentages greater than 50, which was your original question. 

 

You could also use a plain proc means and the average would be the percentage as well. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 234 views
  • 2 likes
  • 6 in conversation