I am stuck with below problem.
Each observation in my data set has 3 different valuation and there is a preferred sequence to select the best valuation out of these 3. I have to output the first non missing valuation based on the preferred sequence to the final data set.
I tried using array by transposing each observation into 3 separate observation and then used self join to find the selected valuation.
This is working fine on sample data set, but my data set is having 90 million rows and self join is taking long time. Please suggest better approach.
Sample data set:
ID | Val 1 | Val2 | val3 | m1 | m2 | m3 |
11 | 101 | 104 | 105 | val1 | val2 | val3 |
12 | 201 | 210 | 211 | val2 | val3 | val1 |
13 | 155 | 150 | val3 | val1 | val2 | |
14 | 201 | val2 | val1 | val3 |
Output:
ID | Valuation | Model |
11 | 101 | val1 |
12 | 210 | val2 |
13 | 155 | val1 |
14 | 201 | val3 |
Hello,
data want;
set have;
array m(*) m:;
array val(*) val:;
format model $4. valuation 3.;
keep Valuation model;
do i=1 to dim(m);
if vvaluex(m(i)) ne . then do;
Valuation=vvaluex(m(i));
model=m(i);
leave;
end;
end;
run;
Like this?
data have;
input ID$ val1-val3 m1$ m2$ m3$;
datalines;
11 101 104 105 val1 val2 val3
12 201 210 211 val2 val3 val1
13 155 150 . val3 val1 val2
14 . . 201 val2 val1 val3
;
data want;
set have;
array m{*} m1-m3;
do i=1 to dim(m);
if vvaluex(m[i])<>. then do;
Valuation=vvaluex(m[i]);
Model=m[i];
output;return;
end;
end;
keep ID Valuation Model;
run;
Thanks for the response.
My actual data set have few more columns associated to each valuation. Also the model and valuation column name is not exactly same. I am not able to use VVALUEX here. Modified data as below:
Please suggest.
Input :
ID | ABVal | ABScore | CDVal | CDScore | DEVal | DEScore | m1 | m2 | m3 |
11 | 101 | 8 | 104 | 6 | 105 | 8 | AB | CD | DE |
12 | 201 | 9 | 210 | 9 | 211 | 5 | CD | DE | AB |
13 | 155 | 10 | 150 | 7 | AB | DE | CD | ||
14 | 201 | 9 | CD | AB | DE |
Output:
ID | Valuation | Score | Model |
11 | 101 | 8 | AB |
12 | 210 | 9 | CD |
13 | 155 | 10 | AB |
14 | 201 | 9 | DE |
Always a good idea to provide sample data that represents your actual data closely 🙂
How many variables does your data set contain? Do you need a dynamic way of putting them into an array or are you able to type the variable names manually?
ID | ABVal | ABScore | CDVal | CDScore | DEVal | DEScore | m1 | m2 | m3 |
11 | 101 | 8 | 104 | 6 | 105 | 8 | AB | CD | DE |
12 | 201 | 9 | 210 | 9 | 211 | 5 | CD | DE | AB |
13 | 155 | 10 | 150 | 7 | AB | DE | CD | ||
14 | 201 | 9 | CD | AB | DE |
This table represents the actual data set closely. For each valuation I have associated score also. (so 2 columns for each valuation)
Models are : AB,CD,DE
Variables are : Val, Score
If your posted data resembles your actual data closely, you are still able to use the VVALUEX function like this
data have;
input ID ABVal ABScore CDVal CDScore DEVal DEScore m1$ m2$ m3$;
datalines;
11 101 8 104 6 105 8 AB CD DE
12 201 9 210 9 211 5 CD DE AB
13 155 10 150 7 . . AB DE CD
14 . . . . 201 9 CD AB DE
;
data want;
set have;
array m{*} m1-m3;
do i=1 to dim(m);
if vvaluex(cats(m[i], "Val")) ne . then do;
Valuation=input(strip(vvaluex(cats(m[i], "Val"))), 8.);
Score=input(strip(vvaluex(cats(m[i], "Score"))), 8.);
Model=m[i];
output;return;
end;
end;
keep ID Valuation Score Model;
run;
Thanks for your help.
This solution works for me.
Anytime, glad to help 🙂
There are different scenarios in data as in below table, out of which how can we handle 4 th scenario using same code.
Current code doesn't output 4th scenario.
ID | ABVal | ABScore | CDVal | CDScore | DEVal | DEScore | m1 | m2 | m3 | Comments |
11 | . | . | . | . | . | . | AB | CD | DE | If all valuation missing then output row with all missing values |
12 | 201 | 9 | 210 | 9 | 211 | 5 | CD | DE | AB | Output first non missing based on model sequence |
13 | 155 | 10 | 150 | 7 | . | . | AB | DE | Output first non missing based on model sequence | |
14 | . | . | . | . | 201 | 9 | CD | AB | Model sequence given(CD,AB) doesn’t have valuation present, but have value for model(DE) which is not present in model sequence - for this scenario need to output row with having missing values |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.