BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
runn
Calcite | Level 5

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

Hello,

 I want to identify zipcodes for which smoke 1 count N > smoke 0 count N. So from above: for zip code 7, smoke 1=77 which is greater than smoke 0=11. Therefore I'll select zip code 7 in output  but not other zip codes.

I want to know SAS code to get output like this from the above data:

Zipcode

7

note: I have to do this for hundred of observations.

 

Thanking you.

 

runn.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What happens in the other case, or if there isn't one?  Anyways:

proc sql;
  create table want as
  select  a.*
  from    (select * from have where smoke=1) a
  left join (select * from have where smoke=0) b
  on        a.zipcode=b.zipcode
  having a.n > b.n;
quit;

Or you could proc transpose them:

proc sort data=have;
  by zip_code smoke;
run;
proc transpose data=have out=want prefix=smoke;
  by zip_code;
  var n;
  id smoke;
run;
data want;
  set want;
  where smoke1 > smoke0;
run;

Or even a diff;

data want;
  set have;
  if lag(zip_code)=zip_code and diff(smoke)>0;
run;

 Note, not tested, provide test data in the form of a datastep.

Reeza
Super User

 

Some basic algebra comes in handy here. I'll leave the math to you, but I think that if the mean value of smoke is greater than 50% it means that the Sum(1)>Sum(0).   

 

proc sql;
create table want as
select zipcode, mean(smoke) as pct_smoke
from have
group by zipcode
having pct_smoke > 0.5;
quit;
Astounding
PROC Star

Assuming your data is in order by ZIPCODE, a DATA step can handle this:

 

data want;

set have;

by zipcode;

if first.zipcode then total=0;

total + N * (smoke - (smoke=0));

if last.zipcode and total > 0;

run;

novinosrin
Tourmaline | Level 20
data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;

proc sql;
create table want as
select *
from have
group by zipcode
having max(n)=n and smoke=1;
quit;


Ksharp
Super User
data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
data want;
 set have;
 if zipCODE=lag(zipCODE) and n>lag(n);
 keep zipCODE;
run;
runn
Calcite | Level 5

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

want:

Zipcode %smoker

1              (40/60)*100=66.67%

5               0

7               7%

and so on for the rest of the observations in the data.

how should I code?

thanking you 

 

Runn

novinosrin
Tourmaline | Level 20
data have;
input smoke	zipCODE	N;
datalines;
0	1	60
1	1	40
0	5	7
0	7	11
1	7	77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end )  as pct
from have
group by zipcode;
quit;
Reeza
Super User

That's what my code did. And if it's over 50% that's the area's you wanted.

 


@runn wrote:

what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:

Data:

smoke

zipCODE

N

0

1

60

1

1

40

0

5

7

0

7

11

1

7

77

 

want:

Zipcode %smoker

1              (40/60)*100=66.67%

5               0

7               7%

and so on for the rest of the observations in the data.

how should I code?

thanking you 

 

Runn


 

runn
Calcite | Level 5

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have

Reeza
Super User

@runn wrote:

I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have


No idea what that means. You want the percent of 1, in mathematical terms that's the SUM(N=1)/ Sum (N=1) + SUM(N=0) => using some math that works out to the same average value of 0/1. Make sure to remove the HAVING clause which filters out the results to include only percentages greater than 50, which was your original question. 

 

You could also use a plain proc means and the average would be the percentage as well. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3846 views
  • 2 likes
  • 6 in conversation