06-25-2017 11:12 PM
Does anybody know of or has found a link for creating a comprehensive Data Dictionary. which also includes a list of all attributes a variable may have.
I know how to do a simple Data Dictionary, but I need something that goes a step further.
Say we have for example a variable called: Gender, then I do not only what the specifics that this is a character variable, but I also want to know what characters are in there: female, male. It the variable is height, then maybe only the min and max, which I get from proc means.
Has anybody come accross a good guide tat shows how it is done?
Any ideas and recommendations are welcome
06-25-2017 11:42 PM - edited 06-25-2017 11:49 PM
This can be done by using labels and this is quick idea I can think.
for example if you have label for dataset, you can see the same in data dictionary
and when you use select * from dictionary.tables you see the below.
|SASHELP||HEART||DATA||Framingham Heart Study|
for example, when I create a new dataset and mention label. It stores those values in data dictionary
create table abcd as
select name, sex label='this variable contains Male and female values'
, age, height, weight from sashelp.class;
and after that if I write a query
select name, label from dictionary.columns
where memname ='ABCD'
and libname ='WORK'
and name ='Sex';
|Column Name||Column Label|
|Sex||this variable contains Male and female values|
This process can be automated by using macro. This is one way I can think of. I am pretty sure, someone might have written some paper on this.
quick search in lexjansen.com gave a paper which I feel might be interesting to you.
06-26-2017 03:22 AM
I believe there is no one procedure to get what you are expecting. You may have to use different procedures to get the expected output.
example if you are trying to determine teh different groups present in a character variable like gender then you have to use the proc freq table statement to get the counts and also the different groups present in that gender variable.
if you want to know the min and max of the variable then you could use the proc means or to go further if you wnat to get the 5 min and 5 max values use proc univariate. Both these procedure work on numeric variables in the var statement.
proc freq data=sashelp.class; table gender; run; proc means data=sashelp.class; var height; run; or proc univariate data=sashelp.class; var height; run;
06-27-2017 06:35 AM - edited 06-27-2017 08:28 AM
A data dictionary is "data about data". That's what the SAS dictionary tables give you.
If you write a data dictionary/define a data model then you can also define the allowed values for a column (the domain to use a data modeling term). SAS doesn't provide the capability to actually implement such constraints on columns.
What you're asking for is what's actually stored in columns as values. That's no more metadata and is not a column attribute. This is not "data about data" but real data; i.e. the domain ("data about data") could allow for values A, B or C but you've got only stored A and C as values ("the real data").
Because it's real data you'll principally only get such information by implementing and executing some sort of report using your data. This is true for any data storage format and any database.
06-27-2017 08:35 AM
I believe that Patrick is on the right path to helping you. If you were able to run a profile on your data, the report results would give you the information you are asking.
Running a profile on data is a common first step in understanding your data and monintoring the data's quality.
06-27-2017 08:19 AM
No easy answers. I would agree with earlier suggestion that you spend some time searching user group papers at lexjansen.com. I haven't looked in several years.
One that I liked was http://www.lexjansen.com/nesug/nesug11/ph/ph12.pdf.