BookmarkSubscribeRSS Feed
Sathish_jammy
Lapis Lazuli | Level 10

Dear Experts,

 

I need to match the value of a particular cell(in Variable1) to another cell(of Variable2) which placed in different Observation.

I'll share the example set to make it a clear picture.

data have;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 .	jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;
run;

The D1 refers to Medicine name; Dx1 refers to Generic name. I updated the Dx1 value manually. All I need to update the matched values in all the consecutive variables (Dx2, Dx3,...). Once the Dx1 and Dx2 are updated. I turn the same step for Dx3 using the Combination match of Dx1 and Dx2 = Dx3. Kindly let me know for any clarification required in this query.

Expected Output

IDD1D2D3Dx1Dx2Dx3
123Med1Med4Med3abcjklghi
456Med2Med5Med7defmnostu
789Med3Med7Med2ghistudef
147Med4Med6 jklpqr 
258Med5Med1Med8mnoabc 
369Med6Med2 pqrdef 

 

3 REPLIES 3
Kurt_Bremser
Super User

First of all: bad data modeling. Transpose to long.

Then build a format, apply it, and drop empty values on the way.

data have;
infile datalines dsd dlm=' ' truncover;
input ID D1$ D2$ D3$ Dx1$ Dx2$ Dx3$;
cards;
123 Med1 Med4 Med3 abc . .
456 Med2 Med5 Med7 def . .
789 Med3 Med3 Med2 ghi . .
147 Med4 Med6 . jkl . .
258 Med5 Med1 Med8 mno . .
369 Med6 Med2 . pqr . .
;

proc transpose data=have out=trans;
var D:;
by id notsorted;
run;

data
  cntlin (
    drop=id _name_
    where=(label ne '')
  )
  trans2 (
    keep=id start
    rename=(start=D)
    where=(D ne '')
  )
;
merge
  trans (where=(length(_name_)=2) rename=(col1=start))
  trans (where=(length(_name_)=3) rename=(col1=label))
  end=done;
;
fmtname = 'medic';
type = 'C';
output cntlin;
output trans2;
if done
then do;
  hlo = 'O';
  label = '***';
  output cntlin;
end;
run;

proc format cntlin=cntlin;
run;

data bad;
set have;
array D D1-D3;
array dx Dx1-Dx3;
do i = 1 to dim(D);
  dx{i} = put(d{i},medic.);
end;
drop i;
run;

data good;
set trans2;
Dx = put(D,medic.);
run;

bad is the format applied to the wide format, good is a properly modeled dataset.

ed_sas_member
Meteorite | Level 14

Hi @Sathish_jammy 

 

A format will be easier for you to manage:

data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6  jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2  pqr  
;
run;

proc format;
	value $ _Medfmt 'Med1' = 'abc'
					'Med2' = 'def'
					'Med3' = 'ghi'
					'Med4' = 'jkl'
					'Med5' = 'mno'
					'Med6' = 'pqr'
					'Med7' = 'pqr'
					other = 'NOT YET DEFINED' /*define how you want to handle those cases*/;
run;

data want;
	set have;
	Dx2 = put(D2,_Medfmt.);
	Dx3 = put(D3,_Medfmt.);
	if strip(D2) = "" then call missing(Dx2);
	if strip(D3) = "" then call missing(Dx3);
run;
mkeintz
PROC Star

I think this is a good situation to use a hash object.  In this case the hash object would be a lookup table dynamically generated from two variables in the data:  D1 and DX1, renamed to _dkey and _dx respectively.

 

data have;
infile datalines dlm=" " dsd missover;
input ID D1$ D2$ D3$ Dx1$;
cards;
123 Med1 Med4 Med3 abc
456 Med2 Med5 Med7 def
789 Med3 Med3 Med2 ghi
147 Med4 Med6  jkl
258 Med5 Med1 Med8 mno
369 Med6 Med2  pqr  
;

data want (drop=_:);
  set have;
  if _n_=1 then do;
    if 0 then set have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx));
    declare hash h (dataset:'have (keep=d1 dx1 rename=(d1=_dkey dx1=_dx)');
      h.definekey('_dkey');
      h.definedata(all:'Y');
      h.definedone();
  end;

  array _d      d2 d3;
  array _x  $4 dx2 dx3 ;
  do over _d;
    if not missing (_d) then do;
      _rc=h.find(key:_d);
      if _rc=0 then _x=_dx;
      call missing(_dx);
    end;
  end;
run;

This assumes there is a one-to-one map between D1 and DX1.   I.e., if there are multiple occurrences of a given D1, all the DX1's for those occurrences are assumed to be identical.

 

If that is not true, then the original data pair for D1 and DX1 are kept, but the lookup object defaults to keeping only whichever D1,DX1 (renamed to _DKEY,_DX) is encountered first.  So any non one-to-many map in D1/DX1  will become one-to-one for D2/DX2  and D3/DX3.

 

I did this using ARRAY and DO OVER incase you end up with, say, D3, D4, …   D20.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 417 views
  • 3 likes
  • 4 in conversation