SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahDew
Obsidian | Level 7

From my not-so-clean data I want to count as correctly as possible the number of unique companies in the data. Each record has a company identifier number and a company name, however, the identifier can be missing. The identifier is assumed to be the most correct, but when missing I need to resort to the company name. I have managed to get the sum using many steps, but I am wondering if it can be done in less steps using "case when" or something else? 

 

Mock data:

data test;
input record comp_ID Name $15.;
datalines;
1 101 SPORTSSHOP
2 101 SportsShop
3 101 Sports Shop
4 102 FoodShop
5 . ClothesShop
6 . CLOTHESSHOP 
7 . Electronics Co.
;
run;

 

The correct count would be 4 (2 based on comp_ID and 2 based on name).

 

 

My short attempt gives the wrong count:

proc sql;
create table count as
select distinct 
(case when comp_ID is not missing then count(distinct(comp_ID))
when comp_ID is missing then count(distinct(upcase(Name))) end) as n_comp_temp
from test;
quit;

proc sql;
create table sum as
select sum(n_comp_temp) as n_comp
from count;
quit;

 

 

The long way that works fine, but can most likely be reduced to less steps:

proc sql;
create table test2 as 
	select *,
	(case when comp_ID is not missing then 1
	when comp_ID is missing then 0 
	end) as ID_present
from test;
quit;

proc sql;
create table test3 as
select distinct ID_present
, count(distinct(comp_ID)) as n_IDs
, count(distinct(upcase(Name))) as n_names
from test2
group by ID_present;
quit;

proc sql;
create table n_IDs as
select n_IDs as n_comp
from test3
where ID_present = 1;
quit;

proc sql;
create table n_names as
select n_names as n_comp
from test3
where ID_present = 0;
quit;

data count;
set n_IDs n_names;
run;

proc sql;
create table count2 as
select sum(n_comp) as n_comp
from count;
quit;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could try:

 


proc sql;
select
	count(distinct case when missing(comp_id) then upcase(compress(name," ,.-")) else put(comp_id, best.) end) as n
from test;
quit;
PG

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20

HI @SarahDew  Would this come close to meet your expectation?

 


data test;
input record comp_ID Name $15.;
datalines;
1 101 SPORTSSHOP
2 101 SportsShop
3 101 Sports Shop
4 102 FoodShop
5 . ClothesShop
6 . CLOTHESSHOP 
7 . Electronics Co.
;
run;
/*List report*/
proc sql;
create table want_list as
select distinct comp_id,compress(upcase(name)) as name,ifc(missing(comp_ID),'name','comp_id') as based_on
from test
group by based_on;
quit;
/*Count report*/
proc sql;
create table want_count as
select ifc(missing(comp_ID),'name','comp_id') as based_on, count( distinct compress(upcase(name))) as count
from test
group by based_on;
quit;
Tom
Super User Tom
Super User

You should nest the COUNT() and CASE in the other order.

For example you could count distinct COMP_ID (which will ignore the missing values) and then count the distinct NAMEs only for records that have missing COMP_ID.

 

select
  count(distinct comp_id) as n_comp_ids
, count(distinct case when (missing(comp_id)) then name else ' ' end) as n_other_names
from test
;
mkeintz
PROC Star

What if your sample data had an 8th record, with no ID, but a name matching another record that did have a ID, such as below:

 

data test;
  input record comp_ID Name $15.;
datalines;
1 101 SPORTSSHOP
2 101 SportsShop
3 101 Sports Shop
4 102 FoodShop
5 . ClothesShop
6 . CLOTHESSHOP 
7 . Electronics Co.
8 . foodshop
run;

Then you can no longer do a simple count of distinct names for the records without id's.  You would have to reduce it be the number of distinct names also NOT found in records with COMP_ID, as here:

 

data want (keep=N_:);
  set test (where=(comp_id^=.))
      test (where=(comp_id=.))
    end=end_of_testdata;

  if _n_=1 then do;
     declare hash hid ();  /*count unique company id's */
       hid.definekey('comp_id');
       hid.definedata('comp_id');
       hid.definedone();
     declare hash hname(); /*track unique names for records having company id's*/
       hname.definekey('name');
       hname.definedata('name');
       hname.definedone();
     declare hash noid (); /*track unique names not found in above */
       noid.definekey('name');
       noid.definedata('name');
       noid.definedone();
  end;
  name=upcase(name);
  if comp_id^=. then do;
     hid.replace();
     hname.replace();
  end;
  else if hname.check()^=0 then noid.replace();
  if end_of_testdata ;
  n_compid=hid.num_items;
  n_other=noid.num_items;
run;
 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SarahDew
Obsidian | Level 7

Thanks for pointing out this possible issue. I had a look at the data and indeed I can see some issues there too. I will look into it to see if I can further improve the count with your method.

PGStats
Opal | Level 21

You could try:

 


proc sql;
select
	count(distinct case when missing(comp_id) then upcase(compress(name," ,.-")) else put(comp_id, best.) end) as n
from test;
quit;
PG
SarahDew
Obsidian | Level 7
This seems to do exactly as asked and in the most concise way. I like it. Thanks

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 7118 views
  • 3 likes
  • 5 in conversation