Select the best model from multiple models using SAS Array

PRAVIN_JAIN · Posted 11-21-2017 04:28 AM

I am stuck with below problem.

Each observation in my data set has 3 different valuation and there is a preferred sequence to select the best valuation out of these 3. I have to output the first non missing valuation based on the preferred sequence to the final data set.

I tried using array by transposing each observation into 3 separate observation and then used self join to find the selected valuation.

This is working fine on sample data set, but my data set is having 90 million rows and self join is taking long time. Please suggest better approach.

Sample data set:

ID	Val 1	Val2	val3	m1	m2	m3
11	101	104	105	val1	val2	val3
12	201	210	211	val2	val3	val1
13	155	150		val3	val1	val2
14			201	val2	val1	val3

Output:

ID	Valuation	Model
11	101	val1
12	210	val2
13	155	val1
14	201	val3

gamotte · Posted 11-21-2017 05:03 AM

Hello,

data want;
    set have;
    array m(*) m:;
    array val(*) val:;
    format model $4. valuation 3.;
    keep Valuation model;
    do i=1 to dim(m);
        if vvaluex(m(i)) ne . then do;
            Valuation=vvaluex(m(i));
            model=m(i);
            leave;
        end;
    end;
run;

PeterClemmensen · Posted 11-21-2017 05:08 AM

Like this?

data have;
input ID$ val1-val3  m1$ m2$ m3$;
datalines;
11 101 104 105 val1 val2 val3 
12 201 210 211 val2 val3 val1 
13 155 150 .   val3 val1 val2 
14 .   .   201 val2 val1 val3 
;

data want;
	set have;
	array m{*} m1-m3;

	do i=1 to dim(m);
		if vvaluex(m[i])<>. then do;
			Valuation=vvaluex(m[i]);
			Model=m[i];	
			output;return;
		end;
	end;

	keep ID Valuation Model;
run;

The DATA to DATA Step Macro
Blog: SASnrd

PRAVIN_JAIN · Posted 11-21-2017 05:38 AM

Thanks for the response.

My actual data set have few more columns associated to each valuation. Also the model and valuation column name is not exactly same. I am not able to use VVALUEX here. Modified data as below:

Please suggest.

Input :

ID	ABVal	ABScore	CDVal	CDScore	DEVal	DEScore	m1	m2	m3
11	101	8	104	6	105	8	AB	CD	DE
12	201	9	210	9	211	5	CD	DE	AB
13	155	10	150	7			AB	DE	CD
14					201	9	CD	AB	DE

Output:

ID	Valuation	Score	Model
11	101	8	AB
12	210	9	CD
13	155	10	AB
14	201	9	DE

PeterClemmensen · Posted 11-21-2017 05:49 AM

Always a good idea to provide sample data that represents your actual data closely 🙂

How many variables does your data set contain? Do you need a dynamic way of putting them into an array or are you able to type the variable names manually?

The DATA to DATA Step Macro
Blog: SASnrd

PRAVIN_JAIN · Posted 11-21-2017 05:58 AM

ID	ABVal	ABScore	CDVal	CDScore	DEVal	DEScore	m1	m2	m3
11	101	8	104	6	105	8	AB	CD	DE
12	201	9	210	9	211	5	CD	DE	AB
13	155	10	150	7			AB	DE	CD
14					201	9	CD	AB	DE

This table represents the actual data set closely. For each valuation I have associated score also. (so 2 columns for each valuation)

Models are : AB,CD,DE

Variables are : Val, Score

PeterClemmensen · Posted 11-21-2017 06:09 AM

If your posted data resembles your actual data closely, you are still able to use the VVALUEX function like this

data have;
input ID ABVal ABScore CDVal CDScore DEVal DEScore m1$ m2$ m3$;
datalines; 
11 101 8  104 6 105 8 AB CD DE 
12 201 9  210 9 211 5 CD DE AB 
13 155 10 150 7 .   . AB DE CD 
14 .   .  .   . 201 9 CD AB DE
;

data want;
	set have;
	array m{*} m1-m3;

	do i=1 to dim(m);
		if vvaluex(cats(m[i], "Val")) ne . then do;
			Valuation=input(strip(vvaluex(cats(m[i], "Val"))), 8.);
			Score=input(strip(vvaluex(cats(m[i], "Score"))), 8.);
			Model=m[i];
			output;return;
		end;
	end;

	keep ID Valuation Score Model;
run;

The DATA to DATA Step Macro
Blog: SASnrd

PRAVIN_JAIN · Posted 11-22-2017 02:23 AM

Thanks for your help.

This solution works for me.

PeterClemmensen · Posted 11-22-2017 02:25 AM

Anytime, glad to help 🙂

The DATA to DATA Step Macro
Blog: SASnrd

PRAVIN_JAIN · Posted 11-23-2017 02:46 AM

There are different scenarios in data as in below table, out of which how can we handle 4 th scenario using same code.

Current code doesn't output 4th scenario.

ID	ABVal	ABScore	CDVal	CDScore	DEVal	DEScore	m1	m2	m3	Comments
11	.	.	.	.	.	.	AB	CD	DE	If all valuation missing then output row with all missing values
12	201	9	210	9	211	5	CD	DE	AB	Output first non missing based on model sequence
13	155	10	150	7	.	.	AB	DE		Output first non missing based on model sequence
14	.	.	.	.	201	9	CD	AB		Model sequence given(CD,AB) doesn’t have valuation present, but have value for model(DE) which is not present in model sequence - for this scenario need to output row with having missing values

Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

Re: Select the best model from multiple models using SAS Array

SAS Innovate 2025: Call for Content

Classroom Training Available!