DATA Step, Macro, Functions and more

Matching data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 92
Accepted Solution

Matching data

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

 


Accepted Solutions
Solution
‎12-25-2017 05:14 PM
Frequent Contributor
Posts: 92

Re: Matching data

[ Edited ]

Thank you so much! Ksharp!!

 

This is exactly what i want!! I really appreciate your time and patience.

 

This community is very helpful.

 

Happy Holidays, KSharp!

 

 

View solution in original post


All Replies
Frequent Contributor
Posts: 92

Matching data

 

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

Super User
Posts: 23,357

Re: Matching data

[ Edited ]

What's Match1 column for then?

 

Spoiler

ursula wrote:

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

 


Frequent Contributor
Posts: 92

Re: Matching data

Thanks for your response.

 

The Match2 is dominant, so we only look into Match2.

Codes values actually are generating from Match1 and Match2 values.

Super User
Posts: 23,357

Re: Matching data

[ Edited ]

So we can entirely ignore Match1 column?

 

If that's the case, I'd probably create an informat from the second table, 1 if it's in the table, 0 otherwise. 

PROC FORMAT CNTLIN can be used to create formats from data sets. 

 

http://www2.sas.com/proceedings/forum2007/068-2007.pdf

 

You can then use SCAN to parse the components of your match variable and using INPUT to convert it to a numeric variable that would then get added together. Here's an example below. 

 

match_value = input (scan(original_variable, 1, "_"), $lookup_fmt.) + 
 input (scan(original_variable, 2, "_"), $lookup_fmt.) ;

I would get it working for one case first and then use an array to do it for the several columns you have. 

 

If you have difficulty post your code, the log and any errors you're receiving. 

 


ursula wrote:

Thanks for your response.

 

The Match2 is dominant, so we only look into Match2.

Codes values actually are generating from Match1 and Match2 values.


 

Frequent Contributor
Posts: 92

Re: Matching data

if codes only consist of Match1, then the value=0 -- in this case, if code1= A_A (both are Match1 values) then result=0

Super User
Posts: 23,357

Re: Matching data


ursula wrote:

if codes only consist of Match1, then the value=0 -- in this case, if code1= A_A (both are Match1 values) then result=0


A is also in Match2 column, so wouldn't that be 2 ?

 

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B
Super User
Posts: 13,358

Re: Matching data

If you want tested code you need to provide data in the form of a data step so we can run code against it.

Example:

data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

With that in mind, I think you need to go through some very specific details of what is compared to what for counting.

 

 

Are you only looking at "match2"? If so, then why is match1 included in the data? If so this gets part of the way there;

proc sort data=work.file1;
   by id;
run;
proc transpose data=work.file1 out=work.ftrans1
;
   by id;
   var code: ;
run;

proc sql;
   create table work.almost as
   select a.id,a._name_,a.col1, countc(a.col1,strip(b.match2)) as result
   from work.ftrans1 as a
        left join
        work.file2 as b
        on a._name_ = b.code
   order by a.id,a._name_,a.col1
   ;
quit;

But I have to leave now and will leave reshaping that last dataset to the desired form for someone else.

 

Frequent Contributor
Posts: 92

Re: Matching data

Thank you for the response.

I appreciate your time.

 

Super User
Posts: 10,699

Re: Matching data

data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};
call missing(of result{*});
do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;
drop _: i j Code  Match1  Match2;
run;
Frequent Contributor
Posts: 92

Re: Matching data

Hi Ksharp,

Thank you so much for the advance codes! 

 

after I run into my real data, I have a little question to make it perfect.

 

When I run using your codes, the ones that have two Match1-- see ID12, code1=A_A, the result should be "0" instead of Blank.

otherwise all are looks good.

But if ID12, code1 = F_F for example, then the result should be "Blank or missing", because F is not Matching with Match1 and Match2. Does it make sense?

 

could you help to fix the code?

 

Thanks again for your time.

 

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0
Super User
Posts: 10,699

Re: Matching data

"

 because F is not Matching with Match1 and Match2. Does it make sense?

 "

 

But you said you want match MATCH2.

 

 

data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
14 F_F C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};

do k=1 to dim(result);
 result{k}=0;
end;

do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;
drop _: i j k Code  Match1  Match2;
run;
Frequent Contributor
Posts: 92

Re: Matching data

Hi KShap,

 

Thanks so so much for your quick response!

I'm so happy almost there.

 

I'm sorry if I was confusing you Smiley Happy

 

I Added one more line into File1.

 

data work.file1;
input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D
14 F_F C_C C_C D_D
15 F_F F_C C_F F_F
;
run;

data work.File2;
input Code $ Match1 $ Match2 $;
datalines;
code1 A B
code2 D C
code3 C A
code4 D B
;
run;

 

I want the result look like this:

for the "F" issue:

"Blank" if F_F -- NoMatch1 and NoMatch2.

see ID15 code3 -- C_F, the result is "Blank" because C is not a Match2, and F is not in File2(see Code3 -- Match2=A) so the result=,(blank)

see ID15 code2 -- F_C, the result is "1", because C is a Match2 even though it has "F" on it.

 

ID code1 code2 code3 code4 result1 result2 result3 result4
11 A_B C_D A_C B_D 1 1 1 1
12 A_A D_D A_A B_B 0 0 2 2
14 B_B C_C C_C D_D 2 2 0 0
14 F_F C_C C_C D_D . 2 0 0
15 F_F F_C C_F F_F . 1 . .

 

Thank you again for your time.

Happy Holidays!

Super User
Posts: 10,699

Re: Matching data

OK. If I understood what you mean.

 

data work.file1;
input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
14 F_F C_C C_C D_D 
15 F_F F_C C_F F_F
;
run;

data work.File2;
input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();

  declare hash h1(dataset:'file2');
  h1.definekey('match1');
  h1.definedone();
  declare hash h2(dataset:'file2');
  h2.definekey('match2');
  h2.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};

do k=1 to dim(result);
 result{k}=0;
end;

do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;

do i=1 to dim(x);
  if result{i}=0 then do; 
  do j=1 to countw(x{i},'_');
    temp=scan(x{i},j,'_');
	if h1.check(key:temp) ne 0 and h2.check(key:temp) ne 0 then result{i}=.;
  end;
  end;
end;
drop _: i j k Code  Match1  Match2 temp;
run;
Solution
‎12-25-2017 05:14 PM
Frequent Contributor
Posts: 92

Re: Matching data

[ Edited ]

Thank you so much! Ksharp!!

 

This is exactly what i want!! I really appreciate your time and patience.

 

This community is very helpful.

 

Happy Holidays, KSharp!

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 470 views
  • 4 likes
  • 4 in conversation