I have a table which is like this:
Geo_Key Var1-Var50(where Var1-Var50 represents 50 columns having value 1/0)
I want to select count of distinct Geo_Key for each column, when its value is=1.
So Results would be like:
Var1 50
Var2 60
....
...
Var50 10
I only know the exhaustive method, where maybe i can run a macro for 50 variables, to select count for each one. But that would require too much coding. I am sure there must be a better way.
I have a shortened version but this should help:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key;
This gives the count by geo_key. I need count of geo_key by variables var1-var50.
Slight variation, this should do it:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct _name_,sum(col1) as result
from prep
group by _name_
order by _name_;
The problem is that geo_key might be repeated(so I am having multiple columns), so i want to count the distinct value of it, when indicator is 1 for that variable. So scenario will change for each variable. So basically I want the count to match to below query:
select count(distinct geo_key) from table where var1=1;
select count(distinct geo_key) from table where var2=1;
.....
select count(distinct geo_key) from table where var50=1;
Here you go, I changed a little.
data have ;
input geo_key $ var1 var2 var3 var4 var5 var6;
cards;
001 1 1 1 1 1 0
002 1 1 1 1 0 0
003 1 1 1 0 0 0
004 1 1 0 0 0 0
005 1 0 0 0 0 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key
order by geo_key;
Thanks for your help. Since there can be duplicates for the geo_key. Its giving me multiple columns around col1-col50. So just summing across col1 wont solve my problem. So i summed across all rows and set a new indicator to 1 if value is >=1. and then summed on that variables, which seems to be doing the job.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.