DATA Step, Macro, Functions and more

Procedure to find variable set that makes dataset unique

Reply
New Contributor
Posts: 3

Procedure to find variable set that makes dataset unique

I have yet to find an automatic way to run something that will tell me the unique key to a dataset.  Often times I've been asked to provide the "key" to a dataset and find myself running procedures utilizing nodupkey or uniquekey but need to provide the variable list.

 

Is there something that anyone has used to be able to find the key without having to use manual methods to drive down to the variable set that makes the data unique?  I have wondered this for a while, reaching out to the SAS community for answers.

 

Thank you!

PROC Star
Posts: 63

Re: Procedure to find variable set that makes dataset unique

A quick and dirty way is to do it with SQL, as shown here with SASHELP.CLASS:

proc sql noprint;
  select catx(' ','count(distinct',name,') as ',name) into :counts separated by ',' from dictionary.columns
  where libname='SASHELP' and memname='CLASS';
quit;


proc sql;
  create table counts as
  select count(*) as _total_,&counts from sashelp.class;
quit;

A quick look at the output table will then show you that the only potential (single) key variable is NAME.

If you want to use it programmatically, you can use TRANSPOSE and SQL to find the potential keys, e.g.:

proc transpose data=counts out=counts2;
  var _numeric_;
run;

proc sql;
  select * from counts2 
  where COL1=(select COL1 from counts2 where _NAME_='_TOTAL_')
    and _NAME_ ne '_TOTAL_'
  ;
quit;

This method is relatively slow when you have a lot of data in your tables, however.

 

PROC Star
Posts: 63

Re: Procedure to find variable set that makes dataset unique

Another possibility is to look through the variables in order of appearance, using a macro:

/* macro to find possible keys on table */
%macro FindKeys(data);
%if not %sysfunc(exist(&data)) %then %do;
  %put FindKeys: Table &data not found;
  %return;
  %end;
/* parse dataset name */
%local libname memname;
%let data=%upcase(&data);
%let memname=%scan(&data,-1);
%let libname=%scan(WORK.%upcase(%sysfunc(getoption(USER))).&data,-2);
/* Find column names */
%local names sqlobs;
proc sql noprint;
  select name into :names separated by ' '
  from dictionary.columns 
  where libname="&libname"
    and memname="&memname"
  ;
quit;
%put _local_;
%if &sqlobs=0 %then %do;
  %put FindKeys: Table &data has no columns;
  %return;
  %end;
proc sort data=&data out=_temp_ nodupkey;
  by &names;
run;
%local i j keycount;
%let keycount=0;
data _null_;
  if 0 then set &data nobs=_all_obs_;
  set _temp_ nobs=_unique_obs_ end=done;
  if _unique_obs_<_all_obs_ then do;
    put "FindKeys: no unique keys found for table &data";
    stop;
    end;
  retain keycount 1;
  by &names;
  select(keycount);
    %do i=1 %to &sqlobs;
      when(&i) do;
        %do j=&i %to &sqlObs;
          if last.%scan(&names,&j) then
            goto KeyOK;
          keycount+1;          
          %end;
        end;
      %end;
    end;
KeyOK:
  if done then call symputx('keycount',keycount);
run;
%if &keycount>0 %then %do;
  %put Findkeys: The following variable(s) may be used as keys:;
  %do i=1 %to &keycount;
    %put %scan(&names,&i);
    %end;
  %end;   
%mend;

This will work OK with e.g. SASHELP.CLASS, where the key variable (NAME) is the first column on the table:

 115        options nonotes;
 116        %FindKeys(sashelp.class);
 Findkeys: The following variable(s) may be used as keys:
 Name

But consider this:

 117        data gylle;
 118          set sashelp.class sashelp.class;
 119          i+1;
 120        run;
 121        %findkeys(gylle);
 Findkeys: The following variable(s) may be used as keys:
 Name
 Sex
 Age
 Height
 Weight
 i

- in this case the best key (if any) is actually the "i" variable, but as it appears as the last column, all the previous columns will be included.

There really is not sure way to find an arbitrary set of key columns, you will have to go back to the original data source and analyse that if you want to be sure. 

But I hope you find some of my ideas helpful.

 

Ask a Question
Discussion stats
  • 2 replies
  • 134 views
  • 0 likes
  • 2 in conversation