BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
manojsanthi
Calcite | Level 5

Hello SAS Experts: I am working on a Loan Data set in SAS EG 7.15. I am conducting a data quality check on the Dataset.

For doing that I need to summarize the data quality based on "count of Loans which have Null values in each of the Loan Attribute" as well as the corresponding Total Principal Balance for each Loan Attribute with Null values.

I am attaching an Excel with the Sample Data.

Table A in the excel Contains Loan Numbers (Dummy Values) and their corresponding PrincipalBalances and also there are 3 fields which indicate a 1 which means a Null value was found in the Original Loan table or indicate a 0 which means a valid non-null value was found in the Original Loan Table.

Table B is the actual table which summarizes and quantifies the Missing Values by Rowcounts and the Unpaid Principal Balance. I would like to know what is the best way to derive the Table B based on Table A, in SAS. Thanks in Advance, Manoj

1 ACCEPTED SOLUTION

Accepted Solutions
TomKari
Onyx | Level 15

I suggest that the first thing you do is use the "Summary Statistics" task to summarize your data and get the numbers you need.

 

On your TableA, I'm going to use A, B, C, etc. to refer to the columns.

 

1. Open the Summary Statistics task
2. Use columns C, D, E as Classification Variables
3. Use columns F, G, H as Analysis Variables
4. On the "Statistics" tab, untick everything, and tick Sum and Number of observations
5. On the Results tab,
a Turn N-way only into All ways
b Untick Show statistics
c Tick Save statistics to data set
6. Run the task

 

The output dataset will be a cross-classification of C, D, and E, which will include the numbers you want. Taking C as an example:
The line with C=1 and _TYPE_=100 will be the results for column C; F_N will be the count of matching rows, and F_Sum will be the principal balance. Same for D and E.

 

You can extract and rename these rows using the query builder, or by writing a quick data step.

   Tom

View solution in original post

4 REPLIES 4
TomKari
Onyx | Level 15

I suggest that the first thing you do is use the "Summary Statistics" task to summarize your data and get the numbers you need.

 

On your TableA, I'm going to use A, B, C, etc. to refer to the columns.

 

1. Open the Summary Statistics task
2. Use columns C, D, E as Classification Variables
3. Use columns F, G, H as Analysis Variables
4. On the "Statistics" tab, untick everything, and tick Sum and Number of observations
5. On the Results tab,
a Turn N-way only into All ways
b Untick Show statistics
c Tick Save statistics to data set
6. Run the task

 

The output dataset will be a cross-classification of C, D, and E, which will include the numbers you want. Taking C as an example:
The line with C=1 and _TYPE_=100 will be the results for column C; F_N will be the count of matching rows, and F_Sum will be the principal balance. Same for D and E.

 

You can extract and rename these rows using the query builder, or by writing a quick data step.

   Tom

manojsanthi
Calcite | Level 5

Thank You for the quick reply Tom!

It was exactly what I was looking for.

TomKari
Onyx | Level 15

I'm glad it was useful. The "Summary Statistics" task, and the underlying PROC MEANS, is INCREDIBLY powerful for summation and aggregation, particularly by multiple dimensions. Well worth studying!

 

Tom

manojsanthi
Calcite | Level 5

Thanks for the Tip Tom!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1849 views
  • 0 likes
  • 2 in conversation