BookmarkSubscribeRSS Feed
1800bigk
Fluorite | Level 6

I am having problems counting things in SAS.  I have a data set with 3 vars: Level1, Level2 and FeatureNbr.  I want to be able to count the number of DISTINCT FeatureNbrs for each possible value of Level1 and Level2.

SAMPLE DATA
Level1Level2FeatureNbr
OpsSVCS23654
OpsForms23654
OpsSupplies23654
OpsOther23654
PeopleSalaries23654
PeopleFleet27589
PeopleBenefits27589
PeopleOther27589
ITEquipment27589
ITProgramming27589
ITPhone36258
InfrastructureEquipment36258
InfrastructureReal Estate36258
InfrastructureOther36258

My normal way to count is with proc summary but that counts each row so I found some proc sql code that does a select count(distinct(featureNbr)) on my data but it just returns one number.

ANy help would be greatly appreciated.

13 REPLIES 13
FriedEgg
SAS Employee

proc sql;

  select level1, level2, count(distinct(featurenbr)) as count

     from sample_data

    group by level1, level2

quit;

art297
Opal | Level 21

But I am concerned why you stated that you couldn't use proc summary.  In your sample data, all of the counts will equal 1 by definition.  In the following, I added an extra data line so that one of the combinations would result in a value of 2:

data have;

  informat Level1 $15.;

  informat Level2 $15.;

  input Level1 Level2 & FeatureNbr;

  cards;

Ops          SVCS  23654

Ops          Forms  23654

Ops          Supplies  23654

Ops          Other  23654

People          Salaries  23654

People          Fleet  27589

People          Benefits  27589

People          Other  27589

IT          Equipment  27589

IT          Programming  27589

IT          Programming  27589

IT          Phone  36258

Infrastructure          Equipment  36258

Infrastructure          Real Estate  36258

Infrastructure          Other  36258

;

proc summary data=have nway;

  var FeatureNbr;

  class level1 level2;

  output out=want (drop=_:) n=;

run;

1800bigk
Fluorite | Level 6

When I did a proc summary to count it was double counting because it would count each row as a new feature but as you can see there are only 3 distinct features in my sample.  I had to re login to the sas server and re load my data so I haven't tired anything yet, thanks for the input.

Tom
Super User Tom
Super User

What do you want as the output?  You were concerned about getting only one number, but then you say that you only want one number for the variable FEATURE.  If you look for distinct values of the variable FEATURE nested within the other two variables (see code above from FriedEgg) then only one combination has more than one distinct value of feature.

If you want the number of distinct levels for each variable then try PROC FREQ with the NLEVELS option.  That should tell you that the variable FEATURE has three distinct values.

art297
Opal | Level 21

I misunderstood what you were seeking and may still not understand your request.

I would go with FriedEgg's suggested proc sql code, but make sure that you add a semicolon before the quit statement:

proc sql;

  select level1, level2,

         count(distinct(featurenbr)) as count

     from have

       group by level1, level2

;

quit;

However, that will give you a count of 1 for each unique level1*level2*featurenbr combination in your data.

1800bigk
Fluorite | Level 6

I want to be able to report out on the number of unique features that occured for any given variable in my data.  I am begienning to realize that this is much more difficult that I thought.  I have 41 variables and zillions of rows and my boss would like me to get the data to a point where we can see how many unique features roll up to any variable in my data.  So even though there is 7gb worth of rows there are only a few thousand possible places that a feature could roll up to.  It looks like I will be doing 41 different proc sql like mentioned above so I can get counts for all my variables. 

art297
Opal | Level 21

I think we are all miscommunicating.  What output would you expect to see from your sample data?

1800bigk
Fluorite | Level 6

Something along the lines of

Level2Count of Distinct FeaturesLevel1Count of Distinct Features
OPS1Benefits1
People2Equipment2
IT2Fleet1
Infrastructure1Forms1
Other3
Phone1
Programming1
Real Estate1
Salaries1
Supplies1
SVCS1
art297
Opal | Level 21

Still not sure, but possibly something like?:

proc sql;

  select level1,

         count(distinct(featurenbr)) as count

     from have

       group by level1

  ;

  select level2,

         count(distinct(featurenbr)) as count

     from have

       group by level2

  ;

quit;

FriedEgg
SAS Employee

data have;

infile cards dsd dlm='09'x;

input (level1-level2) (:$20.) featurenbr;

cards;

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Ops          SVCS          23654

Ops          Forms          23654

Ops          Supplies          23654

Ops          Other          23654

People          Salaries          23654

People          Fleet          27589

People          Benefits          27589

People          Other          27589

IT          Equipment          27589

IT          Programming          27589

IT          Phone          36258

Infrastructure          Equipment          36258

Infrastructure          Real Estate          36258

Infrastructure          Other          36258

Infrastructure          Equipment          27589

Infrastructure          Real Estate          27589

Infrastructure          Other          27589

;

run;

proc sql;

create view vhave as

select distinct featurenbr, level1, level2

   from have;

quit;

proc summary data=vhave;

var featurenbr;

class level2 level1;

output out=want(rename=(_type_=level) where=(level>0) drop=_freq_) n=count;

run;

data level1(drop=level2) level2(drop=level1) cross;

set want;

select(level);

  when(1) output level1;

  when(2) output level2;

  otherwise output cross;

end;

drop level;

run;

DLing
Obsidian | Level 7

Summarize twice:

proc summary data=have missing nway;

     class level1 feature;

     output out=temp(drop=_:) n=;

run;

proc freq data=temp;

     table level1;

run;

Repeat again for level2.  Clumsy but works and generates additional data that may be useful later.

Tom
Super User Tom
Super User

You seem to want to count the number of distinct values for a single variable (&COUNTVAR) when crossed with each of a number of other variables (&VARLIST). You can use the TYPES statement in PROC SUMMARY to generate a dataset that will make it easy to get those counts using PROC FREQ. The only problem I see is when your variables have missing values.

%let varlist=level1 level2 ;

%let countvar=featurenbr ;

proc summary data=have nway ;

  class &varlist &countvar;

  types (&varlist)*&countvar ;

  output out=want1 ;

run;

proc freq data=want1 ;

  tables &varlist ;

run;

Howles
Quartz | Level 8

Note that NWAY contradicts the specifications in the TYPES statement. Apparently SAS ignores NWAY in this situation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 15355 views
  • 0 likes
  • 6 in conversation