Data:
smoke | zipCODE | N |
0 | 1 | 60 |
1 | 1 | 40 |
0 | 5 | 7 |
0 | 7 | 11 |
1 | 7 | 77 |
Hello,
I want to identify zipcodes for which smoke 1 count N > smoke 0 count N. So from above: for zip code 7, smoke 1=77 which is greater than smoke 0=11. Therefore I'll select zip code 7 in output but not other zip codes.
I want to know SAS code to get output like this from the above data:
Zipcode |
7 |
note: I have to do this for hundred of observations.
Thanking you.
runn.
data have;
input smoke zipCODE N;
datalines;
0 1 60
1 1 40
0 5 7
0 7 11
1 7 77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end ) as pct
from have
group by zipcode;
quit;
What happens in the other case, or if there isn't one? Anyways:
proc sql; create table want as select a.* from (select * from have where smoke=1) a left join (select * from have where smoke=0) b on a.zipcode=b.zipcode having a.n > b.n; quit;
Or you could proc transpose them:
proc sort data=have; by zip_code smoke; run; proc transpose data=have out=want prefix=smoke; by zip_code; var n; id smoke; run; data want; set want; where smoke1 > smoke0; run;
Or even a diff;
data want; set have; if lag(zip_code)=zip_code and diff(smoke)>0; run;
Note, not tested, provide test data in the form of a datastep.
Some basic algebra comes in handy here. I'll leave the math to you, but I think that if the mean value of smoke is greater than 50% it means that the Sum(1)>Sum(0).
proc sql;
create table want as
select zipcode, mean(smoke) as pct_smoke
from have
group by zipcode
having pct_smoke > 0.5;
quit;
Assuming your data is in order by ZIPCODE, a DATA step can handle this:
data want;
set have;
by zipcode;
if first.zipcode then total=0;
total + N * (smoke - (smoke=0));
if last.zipcode and total > 0;
run;
data have;
input smoke zipCODE N;
datalines;
0 1 60
1 1 40
0 5 7
0 7 11
1 7 77
;
proc sql;
create table want as
select *
from have
group by zipcode
having max(n)=n and smoke=1;
quit;
data have;
input smoke zipCODE N;
datalines;
0 1 60
1 1 40
0 5 7
0 7 11
1 7 77
;
data want;
set have;
if zipCODE=lag(zipCODE) and n>lag(n);
keep zipCODE;
run;
what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:
Data:
smoke | zipCODE | N |
0 | 1 | 60 |
1 | 1 | 40 |
0 | 5 | 7 |
0 | 7 | 11 |
1 | 7 | 77 |
want:
Zipcode %smoker
1 (40/60)*100=66.67%
5 0
7 7%
and so on for the rest of the observations in the data.
how should I code?
thanking you
Runn
data have;
input smoke zipCODE N;
datalines;
0 1 60
1 1 40
0 5 7
0 7 11
1 7 77
;
proc sql;
create table want as
select zipcode,max(case when smoke = 1 then n end)/max(case when smoke = 0 then n end ) as pct
from have
group by zipcode;
quit;
That's what my code did. And if it's over 50% that's the area's you wanted.
@runn wrote:
what if I want to calculate % of smoker =1 for each zip code How should I code for the same data:
Data:
smoke
zipCODE
N
0
1
60
1
1
40
0
5
7
0
7
11
1
7
77
want:
Zipcode %smoker
1 (40/60)*100=66.67%
5 0
7 7%
and so on for the rest of the observations in the data.
how should I code?
thanking you
Runn
I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have
@runn wrote:
I think this is close to solution but It's not giving me the desired outcome. The newly created want have zero observations even on imputing correct have
No idea what that means. You want the percent of 1, in mathematical terms that's the SUM(N=1)/ Sum (N=1) + SUM(N=0) => using some math that works out to the same average value of 0/1. Make sure to remove the HAVING clause which filters out the results to include only percentages greater than 50, which was your original question.
You could also use a plain proc means and the average would be the percentage as well.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.