turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- Data quality indicators report

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-29-2017 06:33 AM - last edited on 06-29-2017 06:59 AM by RW9

Hi Guys,

I hope to get a data quality report which includes the following indicators:

var | mean | std | min | max | N | Q1 | median | Q3 | IQ_Range | n_low | n_low_percent | n_high | n_high_percent | n_far_low | n_far_low_percent | n_far_high | n_far_high_percent | null_rate | missing | missing_percent |

Could anyone show me the respective code in SAS proc print, or proc freq, or other useful SAS procedure?

Many appreciation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JinboZhao

06-29-2017 06:58 AM - edited 06-29-2017 07:00 AM

Most of what you want can be gained from a proc means procedure:

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-29-2017 07:06 AM

Thank you. I got most of them, but could not get the percentage ones. Could you help me?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JinboZhao

06-29-2017 07:20 AM

Without some test data (in the form of a datastep) I can only give generals. Percents are just count() / N, so you can do these in a datastep. You may need to proc freq your data to get counts, merge that on.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-29-2017 07:32 AM

Hi, my dataset is like this:

ID | OPENTIME | CLOSETIME | GENDER | GRADE | LOANS | FLAG |

1 | 98 | 121 | F | A | 1200 | Y |

2 | 95 | 115 | M | B | 1300 | Y |

3 | 96 | 114 | M | C | 1500 | N |

4 | 99 | 120 | F | D | 1600 | Y |

5 | 98 | 107 | F | E | 1700 | N |

The following is the code I use:

proc means data=table n mean min max std q1 q3 qrange median nmiss ;

var _numeric_;

run;

proc freq data=table;

tables _character_;

run;

But the results for numeric variables do not include the percentages. Could you help me, as I want to get all those results in one report.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JinboZhao

06-29-2017 08:06 AM

Sorry, I don't have time to write a whole report for you. Use those procedures, then merge the required data together, and datastep to calculate any further numbers you need.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JinboZhao

06-29-2017 10:18 AM

One thing to consider for percentages is what is the numerator and denominator to be used. I don't believe you have specified that in any way clear enough. Likely the way will be to create the appropriate Sums in Proc means/summary and then in a data step calculate the percentages.

Or perhaps Proc Report or Tabulate using the data for a report will allow the percentage calculations.