BookmarkSubscribeRSS Feed
bmoon
Calcite | Level 5

We aren't trying to reduce our audits.  We're trying to select the institutions with the highest risk since we have a limited number of audits that can be completed in a year.  We aren't planning on using the fact that there was an audit as an indicator - we're going to use the results of the audit as an indicator.

jakarman
Barite | Level 11

Reeza see the audit as a measurement with a result that can be postive and negative. A type of measurement / probe as normal working with predictions.     

The poster has described he want to optimize wich ones should audited to get the ones with the most troubles

He has also mentioned having a lot of other information events not related to the auditing report itself, those are the predictors

(predictors) > optimize auditing to the  group with highest probability having the most issues (next year?).

When the audit report was the only information you are right. In that case you have no independent predictors. 

---->-- ja karman --<-----
PatrickHall
Obsidian | Level 7

/* I would head in a direction like this ... */

/* This is a simplified example ... */

/* Create audited entities set */

data audited_entities;

      input DUNS_ID $9. AUDITYEAR;

      datalines;

000323667 2001

000323667 2002

000323667 2003

067211318 2005

067211318 2006

067211318 2007

;

run;

/* Sort for counting and later merge */

proc sort; by DUNS_ID; run;

/* Create a summary set by counting audits */

data audited_entities_summary;

      set audited_entities;

      by DUNS_ID;

      retain count 0; /* count will eventually be your new interval/ordinal target */

      if first.DUNS_ID then count= 0;

      count + 1;

      audit_flag= 1; /* audit flag will eventually be your new binary target */

      if last.DUNS_ID then output;

      drop AUDITYEAR;

run;

/* Create a set for unaudited entities */

data unaudited_entities;     

      input DUNS_ID $9.;

      datalines;

000353667

009567493

000874567

;

run;

/* These entities have been not been audited so indicate that in the data */

data unaudited_entities;

      set unaudited_entities;

count= 0;

audit_flag= 0;

run;

/* Sort for later merge */

proc sort; by DUNS_ID; run;

/* Let's add in some other characteristics - just for example */

data entity_characteristics;

      input DUNS_ID $9. descriptive_var1  descriptive_var2  descriptive_var3;

      datalines

000323667 1 2 3

067211318 4 5 6

000353667 7 8 9

009567493 10 11 12

000874567 13 14 15

;

run;

/* Sort for later merge */

proc sort; by DUNS_ID; run;

/* Create set suitable for predictive modeling by merging the */

/* Audited entities data set, the unaudited entities data set and their characteristics */

data model_set (rename= (count= ordinal_target audit_flag= binary_target));

      merge Audited_entities_summary Unaudited_entities Entity_characteristics;

      by DUNS_ID;

run;

/* Now ... I would try either logistic regression or decision tree on the binary target */

/* An alternative approach would be a Poission regression on the ordinal target */

/* All these models are available in SAS/STAT or Enterprise Miner */

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 1910 views
  • 3 likes
  • 6 in conversation