I have 2 datasets which look like below:
First:
var1 var2 ... var1000 Y
n f 1111
c n f 111
d o g 222
b o g 4444
b h 7777
p g 222
c q g 4444
d r h 77
Second:
varName Levels new_level
var1 2nd
var1 b 3rd
var1 c 1st
var1 d 3rd
var2 3rd
var2 n 4th
var2 o 2nd
var2 p 4th
var2 q 2nd
var2 r 1st
...
var1000 f 2nd
var1000 g 2nd
var1000 h 1st
I want the first dataset to have new levels from the second dataset with the same name and type of the variables in the first dataset like below:
want:
var1 var2 ... var1000 Y
2nd 4th 2nd 1111
1st 4th 2nd 111
3rd 2nd 2nd 222
3rd 2nd 2nd 4444
3rd 3rd 2nd 7777
2nd 4th 2nd 222
1st 2nd 2nd 4444
3rd 1st 1st 77
Your help would be greatly appreciated!
One Way :
data first;
input var1 $ var2 $ var1000 $ Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;
data second;
input varName $ Levels $ new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;
data want;
if _n_=1 then do;
if 0 then
set second;
declare hash h(dataset:'second');
h.definekey('varName','Levels');
h.definedata('new_level');
h.definedone();
end;
set first;
array v[*] var1 var2 var1000;
do i=1 to dim(v);
if h.find(key:vname(v[i]),key:v[i]) eq 0 then v[i]=new_level;
else call missing(of v[i]);
end;
drop i varName Levels new_level;
run;
If you mean to update in place then you may have at least one issue that won't allow that: the lengths of the variables involved. If your current value has a length of one, all that would be needed to hold a, b, c etc and try to fit in a value of 1st, 2nd or 3rd then you need to have a length of at least 3.
So before even starting you need to check the lengths of 1000 variables to make sure that the longest value of your "new level" would fit.
Your second data set is almost usable to create a custom format with Varname as a format name, levels as the Start and New_level as the label value. Look at Proc format and CNTLIN option. Then you could use the format with the existing "level" to display the text of the new level.
But there are some details about "blank" values that may need to be clarified as well such as do any of the variables with "blank" have the blank associated with two, or possibly more, new level values?
One Way :
data first;
input var1 $ var2 $ var1000 $ Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;
data second;
input varName $ Levels $ new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;
data want;
if _n_=1 then do;
if 0 then
set second;
declare hash h(dataset:'second');
h.definekey('varName','Levels');
h.definedata('new_level');
h.definedone();
end;
set first;
array v[*] var1 var2 var1000;
do i=1 to dim(v);
if h.find(key:vname(v[i]),key:v[i]) eq 0 then v[i]=new_level;
else call missing(of v[i]);
end;
drop i varName Levels new_level;
run;
As Reeza said, proc format also is a proper tool.
data first;
input var1 $ var2 $ var1000 $ Y;
cards;
a b c 111
c n f 111
d x g 222
;
run;
data second;
input varName $ Levels $ new_level $;
cards;
var1 a 2nd
var1 c 1st
var1 d 3rd
var2 b 1st
var2 n 3rd
var2 o 2nd
var1000 g 5th
;
run;
data fmt;
set second;
type='C';
varName=cats(varName,'_');
rename varName=fmtname Levels=start new_level=label;
run;
proc format cntlin=fmt;
run;
data want;
set first;
array x{*} $ var1--var1000;
array _x{3} $40 _var1 _var2 _var1000;
do i=1 to dim(x);
_x{i}=putc(x{i},cats(vname(x{i}),'_'));
end;
drop i;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.