Hello SAS Experts: I am working on a Loan Data set in SAS EG 7.15. I am conducting a data quality check on the Dataset.
For doing that I need to summarize the data quality based on "count of Loans which have Null values in each of the Loan Attribute" as well as the corresponding Total Principal Balance for each Loan Attribute with Null values.
I am attaching an Excel with the Sample Data.
Table A in the excel Contains Loan Numbers (Dummy Values) and their corresponding PrincipalBalances and also there are 3 fields which indicate a 1 which means a Null value was found in the Original Loan table or indicate a 0 which means a valid non-null value was found in the Original Loan Table.
Table B is the actual table which summarizes and quantifies the Missing Values by Rowcounts and the Unpaid Principal Balance. I would like to know what is the best way to derive the Table B based on Table A, in SAS. Thanks in Advance, Manoj
I suggest that the first thing you do is use the "Summary Statistics" task to summarize your data and get the numbers you need.
On your TableA, I'm going to use A, B, C, etc. to refer to the columns.
1. Open the Summary Statistics task
2. Use columns C, D, E as Classification Variables
3. Use columns F, G, H as Analysis Variables
4. On the "Statistics" tab, untick everything, and tick Sum and Number of observations
5. On the Results tab,
a Turn N-way only into All ways
b Untick Show statistics
c Tick Save statistics to data set
6. Run the task
The output dataset will be a cross-classification of C, D, and E, which will include the numbers you want. Taking C as an example:
The line with C=1 and _TYPE_=100 will be the results for column C; F_N will be the count of matching rows, and F_Sum will be the principal balance. Same for D and E.
You can extract and rename these rows using the query builder, or by writing a quick data step.
Tom
I suggest that the first thing you do is use the "Summary Statistics" task to summarize your data and get the numbers you need.
On your TableA, I'm going to use A, B, C, etc. to refer to the columns.
1. Open the Summary Statistics task
2. Use columns C, D, E as Classification Variables
3. Use columns F, G, H as Analysis Variables
4. On the "Statistics" tab, untick everything, and tick Sum and Number of observations
5. On the Results tab,
a Turn N-way only into All ways
b Untick Show statistics
c Tick Save statistics to data set
6. Run the task
The output dataset will be a cross-classification of C, D, and E, which will include the numbers you want. Taking C as an example:
The line with C=1 and _TYPE_=100 will be the results for column C; F_N will be the count of matching rows, and F_Sum will be the principal balance. Same for D and E.
You can extract and rename these rows using the query builder, or by writing a quick data step.
Tom
Thank You for the quick reply Tom!
It was exactly what I was looking for.
I'm glad it was useful. The "Summary Statistics" task, and the underlying PROC MEANS, is INCREDIBLY powerful for summation and aggregation, particularly by multiple dimensions. Well worth studying!
Tom
Thanks for the Tip Tom!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
