Hi all,
I have two datasets. Dataset 1 is at the person level, and each person between 1 and 15 variables containing values of a categorical variable. Basically, it's like this:
ID Var1 Var2............Var15
A 3 40 9
B 14 8 .
The other dataset is essentially a map connecting values of variables 1-15 to a description.
Value Desc
1 abc
2 def
3 ghi
...
100 zzz
How can I combine the datasets such that for each person's variables 1-15, there will be a description next to it of what that value means?
EG:
ID Var1 Var1_desc Var2 Var2_desc..........Var15 Var15_desc
A 3 ghi 40 xyz 9 jkl
...and so on for each person.
1. Create a format from your second data set.
2. Apply the format using put statements to get the new variables.
3. Consider just using the formatted values perhaps?
Would I have to do that by hand? The second dataset has over 1,000 variable descriptions.
No, you wouldn't have to create it by hand and, like Fareeza said, you may not have to or even want to create the extra 15 variables. The following is one way to create and apply a format:
data codes;
informat value 12.;
informat Desc $50.;
input Value Desc &;
cards;
1 abc
2 def
3 ghi
8 something or other
9 ijk
14 something else
40 something else altogether
100 zzz
;
data ctrl;
set codes (rename=(value=start
Desc=label)) end=last;
retain fmtname 'codes' type 'n';
output;
if last then do;
hlo='O';
label='***ERROR***';
output;
end;
run;
proc format library=work cntlin=ctrl;
run;
data have;
input ID $ Var1-Var15;
cards;
A 3 40 9 2 . . . . . . . . . . .
B 14 8 . 14 . . . . . . . . . . .
;
data want;
set have;
array vars(*) var1-var15;
array var_desc(15) $50.;
do _n_=1 to dim(vars);
if not missing(vars{_n_}) then
var_desc{_n_}=put(vars{_n_},codes.);
end;
run;
data want;
retain
var1 var_desc1
var2 var_desc2
var3 var_desc3
var4 var_desc4
var5 var_desc5
var6 var_desc6
var7 var_desc7
var8 var_desc8
var9 var_desc9
var10 var_desc10
var11 var_desc11
var12 var_desc12
var13 var_desc13
var14 var_desc14
var15 var_desc16
;
set want;
run;
or Hash Table . and reorder variables is also a problem.
data codes; informat value 12.; informat Desc $50.; input Value Desc &; cards; 1 abc 2 def 3 ghi 8 something or other 9 ijk 14 something else 40 something else altogether 100 zzz ; data have; input ID $ Var1-Var15; cards; A 3 40 9 2 . . . . . . . . . . . B 14 8 . 14 . . . . . . . . . . . ; run; data x; set have; if _n_ eq 1 then do; if 0 then set codes; declare hash h(dataset:'codes'); h.definekey('value'); h.definedata('desc'); h.definedone(); end; array v{*} var1-var15; n=0; do i=1 to dim(v); n+1;value=v{i};call missing(desc);rc=h.find();output; end; drop var: i rc; run; proc sql noprint; select distinct catt('x(where=(n=',put(n,best8. -l),') rename=(value=value_',put(n,best8. -l),' desc=desc_',put(n,best8. -l),'))') into : list separated by ' ' from x order by n; quit; %put &list ; data want; merge &list ; by id ; drop n; run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.