DATA Step, Macro, Functions and more

Count different columns

Reply
Super Contributor
Posts: 673

Count different columns

How to get the distinct count of id and acct no as 3 and 5 in a single query? Id                 Acct no
114601087158160940
114601077269250216
424801947268006730
441301599154430103
441301588726577735
Trusted Advisor
Posts: 1,137

Re: Count different columns

Please try the below code

data have;

    input id Acct_no;

cards;

1146010    87158160940

1146010    77269250216

4248019    47268006730

4413015    99154430103

4413015    88726577735

;

run;

proc sql;

    create table want as select count(distinct id) as id_count, count(distinct Acct_no) as Acct_no_count from have;

quit;

Thanks,

Jagadish

Thanks,
Jag
Regular Contributor
Posts: 151

Re: Count different columns

ods select nlevels;

proc freq data=have nlevels;

run;

Contributor
Posts: 70

Re: Count different columns

Is it possible using DATA STEP.. same result as proc sql and proc freq???

Regular Contributor
Posts: 151

Re: Count different columns

Posted in reply to yaswanthj

Technically yes, but it will be messy to code.  Why do you ask when there are 2 much simpler solutions on offer?

Respected Advisor
Posts: 3,156

Re: Count different columns

Posted in reply to yaswanthj

Here is a Hash() approach that is technically producing the same outcome as Proc SQL in ONE step/query:

data want;

keep id_ct acct_ct;

if 0 then set have;

  declare hash acct(dataset:'have');

  acct.definekey('acct_no');

  acct.definedone();

  declare hash i(dataset:'have');

  i.definekey('id');

  i.definedone();

  id_ct=i.num_items;

  acct_ct=acct.num_items;

  output;

  stop;

  run;

Haikuo

SAS Employee
Posts: 122

Re: Count different columns

In current SAS High Performance (HP), one common foundation HP PROC is HPDMDB where you can code like this


proc hpdmdb data=&outdsn. classout=outdsn varout=v  maxlevel=15000000; /*if you set a small #, it wraps the rest into OTHERS*/

class _all_ ;  /*list all the variables in the data set including interval, numeric as well as categorical*/

run ;


The result will show % of each unique value and their levels. For very continuous variables, it does indeed appear very miscellaneous.

If you have Enterprise Miner license, you should be able to use proc DMDB. The difference is when the data set gets big, or when the table does not have too many observations, but you have many columns to count, then it may take some time. Proc HPDMDB runs much faster. It leverages multi-threading capabilities on your computers. If you are set up to run on parallel nodes with in-memory, it will be much, much faster.

Ask a Question
Discussion stats
  • 6 replies
  • 403 views
  • 2 likes
  • 6 in conversation