BookmarkSubscribeRSS Feed
PRAVIN_JAIN
Calcite | Level 5

I am stuck with below problem.

 

Each observation in my data set has 3 different valuation and there is a preferred sequence to select the best valuation out of these 3. I have to output the first non missing valuation based on the preferred sequence to the final data set.

 

I tried using array by transposing each observation into 3 separate observation and then used self join to find the selected valuation.

This is working fine on sample data set, but my data set is having 90 million rows and self join is taking long time. Please suggest better approach.

 

Sample data set:

 

ID Val 1Val2 val3 m1m2 m3
11101104105val1val2val3
12201210211val2val3val1
13155150 val3val1val2
14  201val2val1val3

 

Output:

 

IDValuationModel
11101val1
12210val2
13155val1
14201val3
9 REPLIES 9
gamotte
Rhodochrosite | Level 12

Hello,

 

data want;
    set have;
    array m(*) m:;
    array val(*) val:;
    format model $4. valuation 3.;
    keep Valuation model;
    do i=1 to dim(m);
        if vvaluex(m(i)) ne . then do;
            Valuation=vvaluex(m(i));
            model=m(i);
            leave;
        end;
    end;
run;
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input ID$ val1-val3  m1$ m2$ m3$;
datalines;
11 101 104 105 val1 val2 val3 
12 201 210 211 val2 val3 val1 
13 155 150 .   val3 val1 val2 
14 .   .   201 val2 val1 val3 
;

data want;
	set have;
	array m{*} m1-m3;

	do i=1 to dim(m);
		if vvaluex(m[i])<>. then do;
			Valuation=vvaluex(m[i]);
			Model=m[i];	
			output;return;
		end;
	end;

	keep ID Valuation Model;
run;
PRAVIN_JAIN
Calcite | Level 5

Thanks for the response.

 

My actual data set have few more columns associated to each valuation. Also the model and valuation column name is not exactly same. I am not able to use VVALUEX here. Modified data as below:

 

 

Please suggest.

 

Input :

 

ID ABVal ABScoreCDVal CDScoreDEVal DEScorem1m2 m3
11101810461058ABCDDE
12201921092115CDDEAB
13155101507  ABDECD
14    2019CD ABDE

 

Output:

 

IDValuationScoreModel
111018AB
122109CD
1315510AB
142019DE

 

 

PeterClemmensen
Tourmaline | Level 20

Always a good idea to provide sample data that represents your actual data closely 🙂

 

How many variables does your data set contain? Do you need a dynamic way of putting them into an array or are you able to type the variable names manually?

PRAVIN_JAIN
Calcite | Level 5
ID ABVal ABScoreCDVal CDScoreDEVal DEScorem1m2 m3
11101810461058ABCDDE
12201921092115CDDEAB
13155101507  ABDECD
14    2019CD ABDE

 

This table represents the actual data set closely. For each valuation I have associated score also. (so 2 columns for each valuation)

 

Models are : AB,CD,DE

Variables are : Val, Score

PeterClemmensen
Tourmaline | Level 20

If your posted data resembles your actual data closely, you are still able to use the VVALUEX function like this

 

data have;
input ID ABVal ABScore CDVal CDScore DEVal DEScore m1$ m2$ m3$;
datalines; 
11 101 8  104 6 105 8 AB CD DE 
12 201 9  210 9 211 5 CD DE AB 
13 155 10 150 7 .   . AB DE CD 
14 .   .  .   . 201 9 CD AB DE
;

data want;
	set have;
	array m{*} m1-m3;

	do i=1 to dim(m);
		if vvaluex(cats(m[i], "Val")) ne . then do;
			Valuation=input(strip(vvaluex(cats(m[i], "Val"))), 8.);
			Score=input(strip(vvaluex(cats(m[i], "Score"))), 8.);
			Model=m[i];
			output;return;
		end;
	end;

	keep ID Valuation Score Model;
run;

 

PRAVIN_JAIN
Calcite | Level 5

Thanks for your help. 

This solution works for me.

PRAVIN_JAIN
Calcite | Level 5

There are different scenarios in data as in below table, out of which how can we handle 4 th scenario using same code. 

Current code doesn't output 4th scenario. 

 

ID ABVal ABScoreCDVal CDScoreDEVal DEScorem1m2 m3Comments
11......ABCDDEIf all valuation missing then output row with all missing values
12201921092115CDDEABOutput first non missing based on model sequence
13155101507..ABDE Output first non missing based on model sequence
14....2019CDAB Model sequence given(CD,AB) doesn’t have valuation present,
but have value for model(DE) which is not present in model sequence - for this scenario need to output row with having missing values

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 969 views
  • 0 likes
  • 3 in conversation