Match the cell value to another cell within the table

Sathish_jammy · Posted 02-12-2020 08:42 AM

Dear Experts,

I need to match the value of a particular cell(in Variable1) to another cell(of Variable2) which placed in different Observation.

I'll share the example set to make it a clear picture.

data have;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 .	jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;
run;

The D1 refers to Medicine name; Dx1 refers to Generic name. I updated the Dx1 value manually. All I need to update the matched values in all the consecutive variables (Dx2, Dx3,...). Once the Dx1 and Dx2 are updated. I turn the same step for Dx3 using the Combination match of Dx1 and Dx2 = Dx3. Kindly let me know for any clarification required in this query.

Expected Output

ID	D1	D2	D3	Dx1	Dx2	Dx3
123	Med1	Med4	Med3	abc	jkl	ghi
456	Med2	Med5	Med7	def	mno	stu
789	Med3	Med7	Med2	ghi	stu	def
147	Med4	Med6		jkl	pqr
258	Med5	Med1	Med8	mno	abc
369	Med6	Med2		pqr	def

Kurt_Bremser · Posted 02-12-2020 09:11 AM

First of all: bad data modeling. Transpose to long.

Then build a format, apply it, and drop empty values on the way.

data have;
infile datalines dsd dlm=' ' truncover;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 . jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;

proc transpose data=have out=trans;
var D:;
by id notsorted;
run;

data
  cntlin (
    drop=id _name_
    where=(label ne '')
  )
  trans2 (
    keep=id start
    rename=(start=D)
    where=(D ne '')
  )
;
merge
  trans (where=(length(_name_)=2) rename=(col1=start))
  trans (where=(length(_name_)=3) rename=(col1=label))
  end=done;
;
fmtname = 'medic';
type = 'C';
output cntlin;
output trans2;
if done
then do;
  hlo = 'O';
  label = '***';
  output cntlin;
end;
run;

proc format cntlin=cntlin;
run;

data bad;
set have;
array D D1-D3;
array dx Dx1-Dx3;
do i = 1 to dim(D);
  dx{i} = put(d{i},medic.);
end;
drop i;
run;

data good;
set trans2;
Dx = put(D,medic.);
run;

bad is the format applied to the wide format, good is a properly modeled dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ed_sas_member · Posted 02-12-2020 09:19 AM

Hi @Sathish_jammy

A format will be easier for you to manage:

data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6  jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2  pqr  
;
run;

proc format;
	value $ _Medfmt 'Med1' = 'abc'
					'Med2' = 'def'
					'Med3' = 'ghi'
					'Med4' = 'jkl'
					'Med5' = 'mno'
					'Med6' = 'pqr'
					'Med7' = 'pqr'
					other = 'NOT YET DEFINED' /*define how you want to handle those cases*/;
run;

data want;
	set have;
	Dx2 = put(D2,_Medfmt.);
	Dx3 = put(D3,_Medfmt.);
	if strip(D2) = "" then call missing(Dx2);
	if strip(D3) = "" then call missing(Dx3);
run;

mkeintz · Posted 02-12-2020 10:29 AM

I think this is a good situation to use a hash object. In this case the hash object would be a lookup table dynamically generated from two variables in the data: D1 and DX1, renamed to _dkey and _dx respectively.

data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6  jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2  pqr  
;

data want (drop=_:);
  set have;
  if _n_=1 then do;
    if 0 then set have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx));
    declare hash h (dataset:'have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx)');
      h.definekey('_dkey');
      h.definedata(all:'Y');
      h.definedone();
  end;

  array _d      d2 d3;
  array _x  $4 dx2 dx3 ;
  do over _d;
    if not missing (_d) then do;
      _rc=h.find(key:_d);
      if _rc=0 then _x=_dx;
      call missing(_dx);
    end;
  end;
run;

This assumes there is a one-to-one map between D1 and DX1. I.e., if there are multiple occurrences of a given D1, all the DX1's for those occurrences are assumed to be identical.

If that is not true, then the original data pair for D1 and DX1 are kept, but the lookup object defaults to keeping only whichever D1,DX1 (renamed to _DKEY,_DX) is encountered first. So any non one-to-many map in D1/DX1 will become one-to-one for D2/DX2 and D3/DX3.

I did this using ARRAY and DO OVER incase you end up with, say, D3, D4, … D20.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Re: Match the cell value to another cell within the table

Click image to register for webinar

Classroom Training Available!