Help using Base SAS procedures

Counting Distinct Entries by Variables

Reply
Occasional Contributor
Posts: 19

Counting Distinct Entries by Variables

I am having problems counting things in SAS.  I have a data set with 3 vars: Level1, Level2 and FeatureNbr.  I want to be able to count the number of DISTINCT FeatureNbrs for each possible value of Level1 and Level2.

SAMPLE DATA
Level1Level2FeatureNbr
OpsSVCS23654
OpsForms23654
OpsSupplies23654
OpsOther23654
PeopleSalaries23654
PeopleFleet27589
PeopleBenefits27589
PeopleOther27589
ITEquipment27589
ITProgramming27589
ITPhone36258
InfrastructureEquipment36258
InfrastructureReal Estate36258
InfrastructureOther36258

My normal way to count is with proc summary but that counts each row so I found some proc sql code that does a select count(distinct(featureNbr)) on my data but it just returns one number.

ANy help would be greatly appreciated.

Trusted Advisor
Posts: 1,300

Re: Counting Distinct Entries by Variables

proc sql;

  select level1, level2, count(distinct(featurenbr)) as count

     from sample_data

    group by level1, level2

quit;

PROC Star
Posts: 7,363

Re: Counting Distinct Entries by Variables

But I am concerned why you stated that you couldn't use proc summary.  In your sample data, all of the counts will equal 1 by definition.  In the following, I added an extra data line so that one of the combinations would result in a value of 2:

data have;

  informat Level1 $15.;

  informat Level2 $15.;

  input Level1 Level2 & FeatureNbr;

  cards;

Ops          SVCS  23654

Ops          Forms  23654

Ops          Supplies  23654

Ops          Other  23654

People          Salaries  23654

People          Fleet  27589

People          Benefits  27589

People          Other  27589

IT          Equipment  27589

IT          Programming  27589

IT          Programming  27589

IT          Phone  36258

Infrastructure          Equipment  36258

Infrastructure          Real Estate  36258

Infrastructure          Other  36258

;

proc summary data=have nway;

  var FeatureNbr;

  class level1 level2;

  output out=want (drop=_Smiley Happy n=;

run;

Occasional Contributor
Posts: 19

Counting Distinct Entries by Variables

When I did a proc summary to count it was double counting because it would count each row as a new feature but as you can see there are only 3 distinct features in my sample.  I had to re login to the sas server and re load my data so I haven't tired anything yet, thanks for the input.

Super User
Super User
Posts: 6,500

Counting Distinct Entries by Variables

What do you want as the output?  You were concerned about getting only one number, but then you say that you only want one number for the variable FEATURE.  If you look for distinct values of the variable FEATURE nested within the other two variables (see code above from FriedEgg) then only one combination has more than one distinct value of feature.

If you want the number of distinct levels for each variable then try PROC FREQ with the NLEVELS option.  That should tell you that the variable FEATURE has three distinct values.

PROC Star
Posts: 7,363

Counting Distinct Entries by Variables

I misunderstood what you were seeking and may still not understand your request.

I would go with FriedEgg's suggested proc sql code, but make sure that you add a semicolon before the quit statement:

proc sql;

  select level1, level2,

         count(distinct(featurenbr)) as count

     from have

       group by level1, level2

;

quit;

However, that will give you a count of 1 for each unique level1*level2*featurenbr combination in your data.

Occasional Contributor
Posts: 19

Counting Distinct Entries by Variables

I want to be able to report out on the number of unique features that occured for any given variable in my data.  I am begienning to realize that this is much more difficult that I thought.  I have 41 variables and zillions of rows and my boss would like me to get the data to a point where we can see how many unique features roll up to any variable in my data.  So even though there is 7gb worth of rows there are only a few thousand possible places that a feature could roll up to.  It looks like I will be doing 41 different proc sql like mentioned above so I can get counts for all my variables. 

PROC Star
Posts: 7,363

Counting Distinct Entries by Variables

I think we are all miscommunicating.  What output would you expect to see from your sample data?

Occasional Contributor
Posts: 19

Re: Counting Distinct Entries by Variables

Something along the lines of

Level2Count of Distinct FeaturesLevel1Count of Distinct Features
OPS1Benefits1
People2Equipment2
IT2Fleet1
Infrastructure1Forms1
Other3
Phone1
Programming1
Real Estate1
Salaries1
Supplies1
SVCS1
PROC Star
Posts: 7,363

Re: Counting Distinct Entries by Variables

Still not sure, but possibly something like?:

proc sql;

  select level1,

         count(distinct(featurenbr)) as count

     from have

       group by level1

  ;

  select level2,

         count(distinct(featurenbr)) as count

     from have

       group by level2

  ;

quit;

Trusted Advisor
Posts: 1,300

Counting Distinct Entries by Variables

data have;

infile cards dsd dlm='09'x;

input (level1-level2) (:$20.) featurenbr;

cards;

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Infrastructure          Equipment          27589

Infrastructure          Real Estate          27589

Infrastructure          Other          27589

;

run;

proc sql;

create view vhave as

select distinct featurenbr, level1, level2

   from have;

quit;

proc summary data=vhave;

var featurenbr;

class level2 level1;

output out=want(rename=(_type_=level) where=(level>0) drop=_freq_) n=count;

run;

data level1(drop=level2) level2(drop=level1) cross;

set want;

select(level);

  when(1) output level1;

  when(2) output level2;

  otherwise output cross;

end;

drop level;

run;

Frequent Contributor
Posts: 104

Re: Counting Distinct Entries by Variables

Summarize twice:

proc summary data=have missing nway;

     class level1 feature;

     output out=temp(drop=_Smiley Happy n=;

run;

proc freq data=temp;

     table level1;

run;

Repeat again for level2.  Clumsy but works and generates additional data that may be useful later.

Super User
Super User
Posts: 6,500

Re: Counting Distinct Entries by Variables

You seem to want to count the number of distinct values for a single variable (&COUNTVAR) when crossed with each of a number of other variables (&VARLIST). You can use the TYPES statement in PROC SUMMARY to generate a dataset that will make it easy to get those counts using PROC FREQ. The only problem I see is when your variables have missing values.

%let varlist=level1 level2 ;

%let countvar=featurenbr ;

proc summary data=have nway ;

  class &varlist &countvar;

  types (&varlist)*&countvar ;

  output out=want1 ;

run;

proc freq data=want1 ;

  tables &varlist ;

run;

Regular Contributor
Posts: 184

Counting Distinct Entries by Variables

Note that NWAY contradicts the specifications in the TYPES statement. Apparently SAS ignores NWAY in this situation.

Ask a Question
Discussion stats
  • 13 replies
  • 6260 views
  • 0 likes
  • 6 in conversation