I have a table which is like this:
Geo_Key Var1-Var50(where Var1-Var50 represents 50 columns having value 1/0)
I want to select count of distinct Geo_Key for each column, when its value is=1.
So Results would be like:
Var1 50
Var2 60
....
...
Var50 10
I only know the exhaustive method, where maybe i can run a macro for 50 variables, to select count for each one. But that would require too much coding. I am sure there must be a better way.
I have a shortened version but this should help:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key;
This gives the count by geo_key. I need count of geo_key by variables var1-var50.
Slight variation, this should do it:
data have ;
input geo_key var1 var2 var3 var4 var5 var6;
cards;
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 1 1 1 1 1 1
4 0 0 1 0 1 0
5 1 0 1 0 1 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct _name_,sum(col1) as result
from prep
group by _name_
order by _name_;
The problem is that geo_key might be repeated(so I am having multiple columns), so i want to count the distinct value of it, when indicator is 1 for that variable. So scenario will change for each variable. So basically I want the count to match to below query:
select count(distinct geo_key) from table where var1=1;
select count(distinct geo_key) from table where var2=1;
.....
select count(distinct geo_key) from table where var50=1;
Here you go, I changed a little.
data have ;
input geo_key $ var1 var2 var3 var4 var5 var6;
cards;
001 1 1 1 1 1 0
002 1 1 1 1 0 0
003 1 1 1 0 0 0
004 1 1 0 0 0 0
005 1 0 0 0 0 0
;
proc transpose data=have out=prep;by geo_key;
proc sql;
create table want as
select distinct geo_key,sum(col1) as result
from prep
group by geo_key
order by geo_key;
Thanks for your help. Since there can be duplicates for the geo_key. Its giving me multiple columns around col1-col50. So just summing across col1 wont solve my problem. So i summed across all rows and set a new indicator to 1 if value is >=1. and then summed on that variables, which seems to be doing the job.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.