Solved: Re: selecting categories of a variable for which count of category of ...

runn · Posted 04-16-2018 11:23 AM

Data:

smoke	zipCODE	N
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77

Hello,

I want to identify zipcodes for which smoke 1 count N > smoke 0 count N. So from above: for zip code 7, smoke 1=77 which is greater than smoke 0=11. Therefore I'll select zip code 7 in output but not other zip codes.

I want to know SAS code to get output like this from the above data:

Zipcode

7

note: I have to do this for hundred of observations.

Thanking you.

runn.

novinosrin · Posted 04-18-2018 03:18 PM

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;

View solution in original post

RW9 · Posted 04-16-2018 11:32 AM

What happens in the other case, or if there isn't one? Anyways:

proc sql;
  create table want as
  select  a.*
  from    (select * from have where smoke=1) a
  left join (select * from have where smoke=0) b
  on        a.zipcode=b.zipcode
  having a.n > b.n;
quit;

Or you could proc transpose them:

proc sort data=have;
  by zip_code smoke;
run;
proc transpose data=have out=want prefix=smoke;
  by zip_code;
  var n;
  id smoke;
run;
data want;
  set want;
  where smoke1 > smoke0;
run;

Or even a diff;

data want;
  set have;
  if lag(zip_code)=zip_code and diff(smoke)>0;
run;

Note, not tested, provide test data in the form of a datastep.

Reeza · Posted 04-16-2018 11:36 AM

Some basic algebra comes in handy here. I'll leave the math to you, but I think that if the mean value of smoke is greater than 50% it means that the Sum(1)>Sum(0).

proc sql;
create table want as
select zipcode, mean(smoke) as pct_smoke
from have
group by zipcode
having pct_smoke > 0.5;
quit;

Astounding · Posted 04-16-2018 11:52 AM

Assuming your data is in order by ZIPCODE, a DATA step can handle this:

data want;

set have;

by zipcode;

if first.zipcode then total=0;

total + N * (smoke - (smoke=0));

if last.zipcode and total > 0;

run;

novinosrin · Posted 04-16-2018 12:04 PM

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;

proc sql;
create table want as
select *
from have
group by zipcode
having max(n)=n and smoke=1;
quit;

Ksharp · Posted 04-17-2018 09:30 AM

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
data want;
 set have;
 if zipCODE=lag(zipCODE) and n>lag(n);
 keep zipCODE;
run;

runn · Posted 04-18-2018 02:29 PM

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke	zipCODE	N
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77

want:

Zipcode %smoker

1 (40/60)*100=66.67%

5 0

7 7%

and so on for the rest of the observations in the data.

how should I code?

thanking you

Runn

novinosrin · Posted 04-18-2018 03:18 PM

data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;

Reeza · Posted 04-18-2018 04:02 PM

That's what my code did. And if it's over 50% that's the area's you wanted.

@runn wrote:

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

want:

Zipcode %smoker

1 (40/60)*100=66.67%

5 0

7 7%

and so on for the rest of the observations in the data.

how should I code?

thanking you

Runn

runn · Posted 04-23-2018 04:50 PM

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have

Reeza · Posted 04-23-2018 04:53 PM

@runn wrote:

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have

No idea what that means. You want the percent of 1, in mathematical terms that's the SUM(N=1)/ Sum (N=1) + SUM(N=0) => using some math that works out to the same average value of 0/1. Make sure to remove the HAVING clause which filters out the results to include only percentages greater than 50, which was your original question.

You could also use a plain proc means and the average would be the percentage as well.

selecting categories of a variable for which count of category of another variables is greater than

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Re: selecting categories of a variable for which count of category of another variables is greater t

Registration is open

SAS Training: Just a Click Away