I'm looking for recommendations on a quick way to verify that a file has the variables and appropriate variable types (character/numeric) that I expect.
Any ideas?
Hi,
You can make a shell dataset that has the variables you expect.
Then run proc contents on your shell and what you have and compare the output.
Pseudo code:
proc contents data=shell out=__list1 (keep=name);
run;
proc contents data=have out=__list2 (keep=name);
run;
proc compare base=__list1 compare=__list2 error;
id name ;
run;
That makes it pretty easy to control what you want to consider a difference. So you can decide to ignore case in names, or you can add type and label and other attributes to the output dataset from proc contents, or whatever. With the error option, PROC COMPARE will throw an error if it finds any difference, which is a great option I only discovered in the past couple years.
Instead of proc contents you could use dictionary tables, but often proc contents turns out to be faster if you have a lot of libraries defined and dictionary.columns is huge.
HTH,
--Q.
proc dataset/contents for retrieving the information in a datasets and a compare with a predefined datasets with what you expected.
If you need retrieve that from SAS metadata-server some interfaces are existing.
It is like working with research-data now it is metadata (metadata is describing the data).
Hi Reeza,
This is what I do usually
proc contents data=have out=want;
run;
Then I can explore dataset want to see data structures/variable types lengths etc.
How do you capture the output from the proc compare to verify file structure? I want this to run automatically with no intervention from me.
So if the variable is missing or a variable is numeric when it should be character I want to print an error to that effect.
I am not sure if this answers your questions. Please see below to compare two datasets' structures. There are two datasets (have before processing) and (want after processing).
proc contents data=have out=one;
run;
proc contents data=want out=two;
run;
data one;
set one;
flag=1;
run;
data two;
set two;
flag=2;
run;
proc sql;
create table all as
select * from one
union all
select * from two;
quit;
proc tabulate data=all;
class name type flag;
table name*type,flag;
run;
I'd still have to read the tabulate output
I ended up using a SQL Full Join.
If the name was missing in one file then I print an error to the log using a data _null_ step.
Hi,
You can make a shell dataset that has the variables you expect.
Then run proc contents on your shell and what you have and compare the output.
Pseudo code:
proc contents data=shell out=__list1 (keep=name);
run;
proc contents data=have out=__list2 (keep=name);
run;
proc compare base=__list1 compare=__list2 error;
id name ;
run;
That makes it pretty easy to control what you want to consider a difference. So you can decide to ignore case in names, or you can add type and label and other attributes to the output dataset from proc contents, or whatever. With the error option, PROC COMPARE will throw an error if it finds any difference, which is a great option I only discovered in the past couple years.
Instead of proc contents you could use dictionary tables, but often proc contents turns out to be faster if you have a lot of libraries defined and dictionary.columns is huge.
HTH,
--Q.
Thanks Quentin, the error is what I was looking for.
little to be added. All approaches being mentioned except the SAS datastep merge for comparing
You have now: 2 options to get the info, 3 for comparing.
choose to your additional requirements
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.