I have a table which is like this:
Geo_Key Var1-Var50(where Var1-Var50 represents 50 columns having value 1/0)
I want to select count of distinct Geo_Key for each column, when its value is=1.
So Results would be like:
Var1 50
Var2 60
....
...
Var50 10
I only know the exhaustive method, where maybe i can run a macro for 50 variables, to select count for each one. But that would require too much coding. I am sure there must be a better way.
I have a shortened version but this should help:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key;
This gives the count by geo_key. I need count of geo_key by variables var1-var50.
Slight variation, this should do it:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct _name_,sum(col1) as result
from prep
group by _name_
order by _name_;
The problem is that geo_key might be repeated(so I am having multiple columns), so i want to count the distinct value of it, when indicator is 1 for that variable. So scenario will change for each variable. So basically I want the count to match to below query:
select count(distinct geo_key) from table where var1=1;
select count(distinct geo_key) from table where var2=1;
.....
select count(distinct geo_key) from table where var50=1;
Here you go, I changed a little.
data have ;
input geo_key $ var1 var2 var3 var4 var5 var6;
cards;
001 1 1 1 1 1 0
002 1 1 1 1 0 0
003 1 1 1 0 0 0
004 1 1 0 0 0 0
005 1 0 0 0 0 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key
order by geo_key;
Thanks for your help. Since there can be duplicates for the geo_key. Its giving me multiple columns around col1-col50. So just summing across col1 wont solve my problem. So i summed across all rows and set a new indicator to 1 if value is >=1. and then summed on that variables, which seems to be doing the job.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.