Hello everyone!
I have a large dataset where I would like to do a couple of different types of analysis on the variables in it. I have researched proc compare and proc freq but I am not sure if it is the most efficient solution for my needs. I would like to run this code each night after a dataset has been updated to make sure there are no data issues that occurred and fix any that did.
I would like to:
Write code that summarizes distinct values of values in all variables. This would be a mix of char, numeric, and date variables. I know about proc freq, or simply using distinct queries for each value, but is there a function or script that could do it all at once and return a table that includes all of the variables?
I would also like to compare values between variables in the same dataset. Using proc compare, it gives me each observation where there isn't a match of some sort. The problem with that is there could be thousands of records that don't match. Proc compare is going to limit how much can be printed. I don't need to review by each observation, I just need to know a summary of each value that doesn't match from one variable to the other.
I feel like I am close with proc compare and proc freq, but I am just not quite where I am wanting to be. I need a summarized report that I can dig into and write further queries to investigate any observations that look out of place. I am hoping you guys have some suggestions for me.
Thank you!
... View more