BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

Why are you using CATX()?

 

That is for making delimited lists.  Because it trims and skips missing values you can have different values of the inputs result in the same output.  For example compare when COL2=. and COL3=10  with COL2=10 and COL3=. when using CATX('-',col2,col3) both will result with '10' as the result.

 

Also if you are using CATX() it then why are you use COL1 as the delimiter instead of a constant string, like '/' or '-'?

Kurt_Bremser
Super User

Checking for duplicates always involves sorting, and doing it implicitly with proc sql will often be the least efficient way.

Try this instead:

proc sort
  data=have (keep=col1 col2 col3)
  out=test
  nodupkey
;
by col1 col2 col3;
run;

and look at the log. It will tell you how many duplicates were deleted.

Astounding
PROC Star

If your data set is already sorted by ANY of the three variables, that should help PROC FREQ run faster.  For example, if your data set is sorted by COL2, you could use:

 

proc freq data=have noprint;

   by col2;

   tables col1*col3 / out=counts;

run;

 

The number of observations in COUNTS will be the right total, and PROC FREQ can take advantage of the sorted order.

mkeintz
PROC Star

A point I overlooked.   You are using

   count(distinct(catx(col1,col2,col3)))

 

It would probably be better practice to use 

  count(distinct(catx(' ',col1,col2,col3)))

It would certainly be better once you attempt to go beyond 3 variables.

 

And finally, if you only want the NUMBER of 3-way combinations, and not the DISTRIBUTION, then a hash object allows you to write out the count to the log in one step, as in:

 

data _null_;
  if 0 then set sashelp.cars (keep=model type origin);
  declare hash h (dataset:'sashelp.cars (keep=model type origin)');
   h.definekey(all:'Y');
   h.definedone();
 n_combos=h.num_items;
 put n_combos=;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 3649 views
  • 6 likes
  • 9 in conversation