BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
giant_wolf00
Fluorite | Level 6

Hi,

 

Apologies, I'm REALLY new to SAS, so if this is a basic error forgive me, however I can't figure out where I've gone wrong or indeed neither can anyone else in my organisation. 

 

I have created a Logistic Regression stepwise model to predict someone's likelihood to get into debt.  The model has some grouped and imputed columns in the final selection.  It was also partitioned into training, validation, and test Partitions. 

 

I have successfully scored the test partition data without issue, and now I want to score a brand new set of data - it is with this I have the issue. 

 

The new data has been imported, the roles and data levels (nominal, internal...) have all been changed to the appropriate type and match the data set the model was originally built on.  The role of the new data is set to "score" and the subsequent score node appears to have run without issue (it has a green tick!).  I then have a code node attached to this to sort and rank (decile) the scored column, however it says the scored variable P_column_nameY (my output is Y/N) doesn't exist.

 

Just to double check I didn't get the variable name wrong I then attached a code node to the score node to "print" the data...   "proc print data= tablename;  run;"  

 

The output of the "print", doesn't include any scored columns at all, so no wonder the sort and rank wasn't working.  Does anyone know why and where I've gone wrong?

 

I have attached a screen grab of my miner project - the bit within the yellow box the process from start to finish.  Anything outside of this are different variations of the model I haven't taken forward or the (working) scoring of the modelled test data

 

miner screengrab.JPG

 

Thanks in advance,

 

GW.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

If instead of "Browse" from the Exported Data window, you do "Properties..." then click on the Variables tab, click the "Label" checkbox, you should be able to see the variable names and labels for your new columns; what you are seeing with "Predicted: ..." is the label, and the variable name is what you want to use in your code, I think what you have above would be correct, but you can double-check here.

 

Then in your SAS Code node, try using &em_import_score as your data set name, this is a macro variable that references the incoming score data set, so:

proc sort data=&em_import_score  out=/*use a new name here like: work.something*/;

...

 

View solution in original post

4 REPLIES 4
WendyCzika
SAS Employee

Your process all sounds/looks correct...can you select the Score node and click on the "..." for Exported Data in the properties panel, and browse your SCORE data from there to see if it has the correct columns?

giant_wolf00
Fluorite | Level 6

Wendy,

 

that's great thank you - I really appreciate the reply and help. I have "browsed" the data (I learnt something new!) and there are indeed scored columns in there.  However they aren't in the formt I was expecting for the column names - which is possibly where it is going wrong in the code node when I come to sort.  I want to rank (and decile) the predicted outcome = 'Y', which presumably is the column "predicted: DEBT_90_DAYS_ANY=Y". 

 

miner data screen grab.JPG

 

My code for the sort is referencing "P_DEBT_90_DAYS_ANYY"  as I thought the format was p_mycolumnnameY (or 1/0 if using 1 or 0 instead of Y/N).  However it says the variable doesn't exist, so I'm guessing I have the naming convention wrong or I've missed a step somewhere.

 

proc sort
data= mydata.my_original_table;
out= mydata.my_original_table;
by descending P_DEBT_90_DAYS_ANYY;
run;

 

The data table referenced in the sort code is the same table name as the table that was imported with the new data I wanted to score - presumably this is the correct way?

 

Thanks in advance for any further help,

 

GW

WendyCzika
SAS Employee

If instead of "Browse" from the Exported Data window, you do "Properties..." then click on the Variables tab, click the "Label" checkbox, you should be able to see the variable names and labels for your new columns; what you are seeing with "Predicted: ..." is the label, and the variable name is what you want to use in your code, I think what you have above would be correct, but you can double-check here.

 

Then in your SAS Code node, try using &em_import_score as your data set name, this is a macro variable that references the incoming score data set, so:

proc sort data=&em_import_score  out=/*use a new name here like: work.something*/;

...

 

giant_wolf00
Fluorite | Level 6

Wendy,

 

Thank you..... its fixed!  It wouldn't let me do it exactly how you described (not sure why - unless I wrote it wrongly), so I created an additional code node in between using:

 

data mydata.mytable;
Set &em_import_score;
run;

 

I then carried on as normal using the proc sort data code, and it appears to have worked, so I'm very grateful to you.

 

Thank you.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2154 views
  • 2 likes
  • 2 in conversation