03-22-2017 11:06 AM
I have yet to find an automatic way to run something that will tell me the unique key to a dataset. Often times I've been asked to provide the "key" to a dataset and find myself running procedures utilizing nodupkey or uniquekey but need to provide the variable list.
Is there something that anyone has used to be able to find the key without having to use manual methods to drive down to the variable set that makes the data unique? I have wondered this for a while, reaching out to the SAS community for answers.
03-23-2017 10:13 AM
A quick and dirty way is to do it with SQL, as shown here with SASHELP.CLASS:
proc sql noprint; select catx(' ','count(distinct',name,') as ',name) into :counts separated by ',' from dictionary.columns where libname='SASHELP' and memname='CLASS'; quit; proc sql; create table counts as select count(*) as _total_,&counts from sashelp.class; quit;
A quick look at the output table will then show you that the only potential (single) key variable is NAME.
If you want to use it programmatically, you can use TRANSPOSE and SQL to find the potential keys, e.g.:
proc transpose data=counts out=counts2; var _numeric_; run; proc sql; select * from counts2 where COL1=(select COL1 from counts2 where _NAME_='_TOTAL_') and _NAME_ ne '_TOTAL_' ; quit;
This method is relatively slow when you have a lot of data in your tables, however.
03-23-2017 11:26 AM
Another possibility is to look through the variables in order of appearance, using a macro:
/* macro to find possible keys on table */ %macro FindKeys(data); %if not %sysfunc(exist(&data)) %then %do; %put FindKeys: Table &data not found; %return; %end; /* parse dataset name */ %local libname memname; %let data=%upcase(&data); %let memname=%scan(&data,-1); %let libname=%scan(WORK.%upcase(%sysfunc(getoption(USER))).&data,-2); /* Find column names */ %local names sqlobs; proc sql noprint; select name into :names separated by ' ' from dictionary.columns where libname="&libname" and memname="&memname" ; quit; %put _local_; %if &sqlobs=0 %then %do; %put FindKeys: Table &data has no columns; %return; %end; proc sort data=&data out=_temp_ nodupkey; by &names; run; %local i j keycount; %let keycount=0; data _null_; if 0 then set &data nobs=_all_obs_; set _temp_ nobs=_unique_obs_ end=done; if _unique_obs_<_all_obs_ then do; put "FindKeys: no unique keys found for table &data"; stop; end; retain keycount 1; by &names; select(keycount); %do i=1 %to &sqlobs; when(&i) do; %do j=&i %to &sqlObs; if last.%scan(&names,&j) then goto KeyOK; keycount+1; %end; end; %end; end; KeyOK: if done then call symputx('keycount',keycount); run; %if &keycount>0 %then %do; %put Findkeys: The following variable(s) may be used as keys:; %do i=1 %to &keycount; %put %scan(&names,&i); %end; %end; %mend;
This will work OK with e.g. SASHELP.CLASS, where the key variable (NAME) is the first column on the table:
115 options nonotes; 116 %FindKeys(sashelp.class); Findkeys: The following variable(s) may be used as keys: Name
But consider this:
117 data gylle; 118 set sashelp.class sashelp.class; 119 i+1; 120 run; 121 %findkeys(gylle); Findkeys: The following variable(s) may be used as keys: Name Sex Age Height Weight i
- in this case the best key (if any) is actually the "i" variable, but as it appears as the last column, all the previous columns will be included.
There really is not sure way to find an arbitrary set of key columns, you will have to go back to the original data source and analyse that if you want to be sure.
But I hope you find some of my ideas helpful.