I wrote this to do the following:
1. Find all fields in a dataset with a length over 8 bytes.
2. For each of them, count the distinct values in the dataset.
3. If there are at least 10 records per value, create a surrogate key table for the field and replace the field with the key in a copy of the dataset.
There's also a 3rd copy of the database that has the field and the key in it.
If you're comfortable with the results, you can then delete the original dataset as well as the 3rd version. I didn't delete them because I wanted to show the difference in size.
For example, one dataset started out at 3,491,692,544 bytes with 6 fields converted to keys resulting in a normalzed dataset at 2,379,612,160 with 6 key tables (23,461,888) for a saving of 31%.
%macro normalize(lib,dsn); PROC SQL; SELECT name into :name1 - FROM dictionary.columns WHERE compress(libname||'.'||memname) = "&lib..&dsn" AND length > 8; quit; %if &SQLOBS = 0 %then %return; %let reccnt = &sqlobs; PROC SQL; CREATE TABLE &lib..&dsn.2 AS SELECT * FROM &lib..&dsn; QUIT; %do i = 1 %to &reccnt; %put &&name&i; proc sql; select count(*),COUNT(DISTINCT(&&name&i)),int(count(*)/COUNT(DISTINCT(&&name&i))) into :count,:countdist,:pct from &lib..&dsn; quit; %put &=count &=countdist &=pct; %if &pct > 10 %then %do; proc sql; CREATE TABLE work.distinct AS SELECT DISTINCT &&name&i FROM &lib..&dsn; quit; data &lib..sk_&&name&i; set work.distinct; sk_&&name&i =_n_; run; PROC SQL; CREATE TABLE &lib..&dsn.3 AS SELECT t2.*, t1.sk_&&name&i FROM &lib..sk_&&name&i t1 INNER JOIN &lib..&dsn.2 t2 ON (t1.&&name&i = t2.&&name&i); QUIT; data &lib..&dsn.2; set &lib..&dsn.3; drop &&name&i; run; %end; %end; %mend normalize;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.