BookmarkSubscribeRSS Feed
Anuz
Quartz | Level 8

Hi All, 

What is the best way to find and replace values in a field separated by , from values in a mapping table 

 

DATA HAVE;
	length R_TYP $250. ;
	infile datalines ;
	input R_TYP;
	datalines;
son,daughter
uncle
aunty
son,uncle
;
run;


DATA TEST_MAPPING;
	length K_TYP $100. S_TYP $100.;
	infile datalines delimiter=',' dsd truncover;
	input K_TYP  S_TYP;
	datalines;
son,family
daughter,family
uncle,extended family
aunty,extended family
;
run;

DATA WANT;
	length R_TYP $250. ;
	infile datalines ;
	input R_TYP;
	datalines;
family,family
extended family
extended family
family,extended family
;
run;

The output in WANT table is what I need as my final output by checking values in the MAPPING table - TEST_MAPPING

9 REPLIES 9
ballardw
Super User

Can you state some rules on what to find where? And the replacement rules?

 

An example without any description of the rules involves leads to lots of guessing and back-and-forth of "that isn't correct" and "why is it not correct".

 

Most of the time any variable that has multiple values in it for a single observation points to sub-optimal data structure which can cause lots of extra work to get anything done. Why are there multiple values?

Anuz
Quartz | Level 8
The HAVE table is what I receive.
WANT table is my desired output.

For example a value of son should be replaced by family from the TEST_MAPPING table.
If the column has multiple values like son, uncle. My output should say family, extended family as replaced values

ballardw
Super User

@Anuz wrote:
The HAVE table is what I receive.
WANT table is my desired output.

For example a value of son should be replaced by family from the TEST_MAPPING table.
If the column has multiple values like son, uncle. My output should say family, extended family as replaced values


Just because someone gives you poorly structured data is no reason to keep such a structure.

If you provide accurate analysis or reporting then what happens in between receipt of data and the report shouldn't matter.

Anuz
Quartz | Level 8

agree.. unfortunately that is the format it is required in to feed another process. I have to align to the requirement.  

Tom
Super User Tom
Super User

If the data is smallish just do the code I posted before as it is much clearer what is going on.

 

If the data is large you might get better performance by doing the transformation in one step.  Essentially something like:

data want;
  set have;
  length temp $250 ;
  do col=1 to countw(r_typ,',');
     temp=catx(',',temp,put(left(scan(r_type,col,',')),$group.));
  end;
  r_typ=temp;
  drop temp col;
run;

Where $GROUP is a format you generated from your LOOKUP table that would map 'son' to 'family' and 'uncle' to 'extended family'.

Tom
Super User Tom
Super User

It will be MUCH easier if you convert you delimited list of values into multiple observations instead.

So given your two input datasets (let's call the second one LOOKUP instead of TEST).

data have;
  infile datalines truncover;
  input R_TYP $250.;
datalines;
son,daughter
uncle
aunty
son,uncle
;

data lookup ;
  infile datalines dsd dsd truncover;
  input K_TYP :$100.  S_TYP :$100.;
datalines;
son,family
daughter,family
uncle,extended family
aunty,extended family
;

First fix the WANT dataset to have one observation per value.

data fixed;
  row+1;
  set have;
  do col=1 to max(1,countw(r_typ,','));
    length k_typ $100;
    k_typ = left(scan(r_typ,col,','));
    output;
  end;
  drop r_typ;
run;

Then you can combine on the common variable K_TYP.

proc sql noprint;
create table matched as
  select a.*,b.s_typ
  from fixed a 
    left join lookup b
    on a.k_typ = b.k_typ
  order by row,col
;
quit;

Resuls

Obs    row    col    k_typ       S_TYP

 1      1      1     son         family
 2      1      2     daughter    family
 3      2      1     uncle       extended family
 4      3      1     aunty       extended family
 5      4      1     son         family
 6      4      2     uncle       extended family

And if you did want to rebuild that delimited list it can be simply done with another data step.

data want;
  do until(last.row);
    set matched;
    by row col;
    length r_typ $250;
    r_typ=catx(',',r_typ,s_typ);
  end;
  keep row r_typ;
run;

Result

Obs    row    r_typ

 1      1     family,family
 2      2     extended family
 3      3     extended family
 4      4     family,extended family

 

Anuz
Quartz | Level 8

Thank you Tom. Let me try that/ 

Reeza
Super User

A bit more of a verbose solution but easy to follow and modify. 

 

data have;
    infile datalines truncover;
    input R_TYP $250.;
    datalines;
son,daughter
uncle
aunty
son,uncle
;

data lookup;
    infile datalines dsd dsd truncover;
    input K_TYP :$100. S_TYP :$100.;
    datalines;
son,family
daughter,family
uncle,extended family
aunty,extended family
;
    
    
    
*find number of values per row for array dimensions;
data _null_;
    set have end=eof;
    retain max_num 0;
    n_words=countc(r_typ, ',')+1;
    max_num=max(n_words, max_num);

    if eof then
        call symputx('num_words', n_words);
run;

%put &num_words;

*create format for mapping;
data lookup_fmt;
    set lookup;
    start=k_typ;
    label=s_typ;
    type='C';
    fmtname='family_fmt';
run;

proc format cntlin=lookup_fmt;
run;

*map codes to new values;
data mapped_summarized;
    set have;
    array _original(&num_words.) $100.;
    array _mapped(&num_words.) $100.;

    do i=1 to countc(r_typ, ",")+1;
        _original(i)=trim(scan(r_typ, i, ','));
        _mapped(i)=put(_original(i), $family_fmt.);
    end;
    
    *concatenate output as desired;
    r_typ_new=catx(", ", of _mapped(*));
run;

@Anuz wrote:

Hi All, 

What is the best way to find and replace values in a field separated by , from values in a mapping table 

 

DATA HAVE;
	length R_TYP $250. ;
	infile datalines ;
	input R_TYP;
	datalines;
son,daughter
uncle
aunty
son,uncle
;
run;


DATA TEST_MAPPING;
	length K_TYP $100. S_TYP $100.;
	infile datalines delimiter=',' dsd truncover;
	input K_TYP  S_TYP;
	datalines;
son,family
daughter,family
uncle,extended family
aunty,extended family
;
run;

DATA WANT;
	length R_TYP $250. ;
	infile datalines ;
	input R_TYP;
	datalines;
family,family
extended family
extended family
family,extended family
;
run;

The output in WANT table is what I need as my final output by checking values in the MAPPING table - TEST_MAPPING


 

Ksharp
Super User
DATA HAVE;
	length R_TYP $250. ;
	infile datalines ;
	input R_TYP;
	datalines;
son,daughter
uncle
aunty
son,uncle
;
run;


DATA TEST_MAPPING;
	length K_TYP $100. S_TYP $100.;
	infile datalines delimiter=',' dsd truncover;
	input K_TYP  S_TYP;
	datalines;
son,family
daughter,family
uncle,extended family
aunty,extended family
;
run;

data want;
 if _n_=1 then do;
   if 0 then set TEST_MAPPING;
   declare hash h(dataset:'TEST_MAPPING');
   h.definekey('K_TYP');
   h.definedata('S_TYP');
   h.definedone();
 end;
set have;
length want temp $ 200;
do i=1 to countw(R_TYP,',');
 temp=scan(R_TYP,i,',');
 if h.find(key:temp)=0 then want=catx(',',want,S_TYP);
  else want=catx(',',want,temp);
end;
keep want;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 529 views
  • 4 likes
  • 5 in conversation