Dear Experts,
I need to match the value of a particular cell(in Variable1) to another cell(of Variable2) which placed in different Observation.
I'll share the example set to make it a clear picture.
data have;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 . jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;
run;
The D1 refers to Medicine name; Dx1 refers to Generic name. I updated the Dx1 value manually. All I need to update the matched values in all the consecutive variables (Dx2, Dx3,...). Once the Dx1 and Dx2 are updated. I turn the same step for Dx3 using the Combination match of Dx1 and Dx2 = Dx3. Kindly let me know for any clarification required in this query.
Expected Output
ID | D1 | D2 | D3 | Dx1 | Dx2 | Dx3 |
123 | Med1 | Med4 | Med3 | abc | jkl | ghi |
456 | Med2 | Med5 | Med7 | def | mno | stu |
789 | Med3 | Med7 | Med2 | ghi | stu | def |
147 | Med4 | Med6 | jkl | pqr | ||
258 | Med5 | Med1 | Med8 | mno | abc | |
369 | Med6 | Med2 | pqr | def |
First of all: bad data modeling. Transpose to long.
Then build a format, apply it, and drop empty values on the way.
data have;
infile datalines dsd dlm=' ' truncover;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 . jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;
proc transpose data=have out=trans;
var D:;
by id notsorted;
run;
data
cntlin (
drop=id _name_
where=(label ne '')
)
trans2 (
keep=id start
rename=(start=D)
where=(D ne '')
)
;
merge
trans (where=(length(_name_)=2) rename=(col1=start))
trans (where=(length(_name_)=3) rename=(col1=label))
end=done;
;
fmtname = 'medic';
type = 'C';
output cntlin;
output trans2;
if done
then do;
hlo = 'O';
label = '***';
output cntlin;
end;
run;
proc format cntlin=cntlin;
run;
data bad;
set have;
array D D1-D3;
array dx Dx1-Dx3;
do i = 1 to dim(D);
dx{i} = put(d{i},medic.);
end;
drop i;
run;
data good;
set trans2;
Dx = put(D,medic.);
run;
bad is the format applied to the wide format, good is a properly modeled dataset.
A format will be easier for you to manage:
data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6 jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2 pqr
;
run;
proc format;
value $ _Medfmt 'Med1' = 'abc'
'Med2' = 'def'
'Med3' = 'ghi'
'Med4' = 'jkl'
'Med5' = 'mno'
'Med6' = 'pqr'
'Med7' = 'pqr'
other = 'NOT YET DEFINED' /*define how you want to handle those cases*/;
run;
data want;
set have;
Dx2 = put(D2,_Medfmt.);
Dx3 = put(D3,_Medfmt.);
if strip(D2) = "" then call missing(Dx2);
if strip(D3) = "" then call missing(Dx3);
run;
I think this is a good situation to use a hash object. In this case the hash object would be a lookup table dynamically generated from two variables in the data: D1 and DX1, renamed to _dkey and _dx respectively.
data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6 jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2 pqr
;
data want (drop=_:);
set have;
if _n_=1 then do;
if 0 then set have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx));
declare hash h (dataset:'have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx)');
h.definekey('_dkey');
h.definedata(all:'Y');
h.definedone();
end;
array _d d2 d3;
array _x $4 dx2 dx3 ;
do over _d;
if not missing (_d) then do;
_rc=h.find(key:_d);
if _rc=0 then _x=_dx;
call missing(_dx);
end;
end;
run;
This assumes there is a one-to-one map between D1 and DX1. I.e., if there are multiple occurrences of a given D1, all the DX1's for those occurrences are assumed to be identical.
If that is not true, then the original data pair for D1 and DX1 are kept, but the lookup object defaults to keeping only whichever D1,DX1 (renamed to _DKEY,_DX) is encountered first. So any non one-to-many map in D1/DX1 will become one-to-one for D2/DX2 and D3/DX3.
I did this using ARRAY and DO OVER incase you end up with, say, D3, D4, … D20.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.