BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ursula
Pyrite | Level 9

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ursula
Pyrite | Level 9

Thank you so much! Ksharp!!

 

This is exactly what i want!! I really appreciate your time and patience.

 

This community is very helpful.

 

Happy Holidays, KSharp!

 

 

View solution in original post

14 REPLIES 14
ursula
Pyrite | Level 9

 

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

Reeza
Super User

What's Match1 column for then?

 

Spoiler

@ursula wrote:

Hi there,

I tried to match 2 file to generate a new file.

 

see below example:

File 1:

ID code1 code2 code3 code4
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D

 

File 2:

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B

 

look into code1 in file1: A_B, A_A_, B_B. I want to see if it matches with Code1 in File2 variable Match2, in this case Match2 =B

if code1 in file1 only match 1 letter then the result will be 1, if match 2 letter then the result will be 2, if No match then the result will be 0.

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0

 

is there a SAS code that can run for generating the results for all variables ?

 

I hope it's not confusing you. 

Please let me know if you need more information.

 

Thanks in advance!!

 


ursula
Pyrite | Level 9

Thanks for your response.

 

The Match2 is dominant, so we only look into Match2.

Codes values actually are generating from Match1 and Match2 values.

Reeza
Super User

So we can entirely ignore Match1 column?

 

If that's the case, I'd probably create an informat from the second table, 1 if it's in the table, 0 otherwise. 

PROC FORMAT CNTLIN can be used to create formats from data sets. 

 

http://www2.sas.com/proceedings/forum2007/068-2007.pdf

 

You can then use SCAN to parse the components of your match variable and using INPUT to convert it to a numeric variable that would then get added together. Here's an example below. 

 

match_value = input (scan(original_variable, 1, "_"), $lookup_fmt.) + 
 input (scan(original_variable, 2, "_"), $lookup_fmt.) ;

I would get it working for one case first and then use an array to do it for the several columns you have. 

 

If you have difficulty post your code, the log and any errors you're receiving. 

 


@ursula wrote:

Thanks for your response.

 

The Match2 is dominant, so we only look into Match2.

Codes values actually are generating from Match1 and Match2 values.


 

ursula
Pyrite | Level 9

if codes only consist of Match1, then the value=0 -- in this case, if code1= A_A (both are Match1 values) then result=0

Reeza
Super User

@ursula wrote:

if codes only consist of Match1, then the value=0 -- in this case, if code1= A_A (both are Match1 values) then result=0


A is also in Match2 column, so wouldn't that be 2 ?

 

Code Match1 Match2
code1 A B
code2 D C
code3 C A
code4 D B
ballardw
Super User

If you want tested code you need to provide data in the form of a data step so we can run code against it.

Example:

data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

With that in mind, I think you need to go through some very specific details of what is compared to what for counting.

 

 

Are you only looking at "match2"? If so, then why is match1 included in the data? If so this gets part of the way there;

proc sort data=work.file1;
   by id;
run;
proc transpose data=work.file1 out=work.ftrans1
;
   by id;
   var code: ;
run;

proc sql;
   create table work.almost as
   select a.id,a._name_,a.col1, countc(a.col1,strip(b.match2)) as result
   from work.ftrans1 as a
        left join
        work.file2 as b
        on a._name_ = b.code
   order by a.id,a._name_,a.col1
   ;
quit;

But I have to leave now and will leave reshaping that last dataset to the desired form for someone else.

 

ursula
Pyrite | Level 9

Thank you for the response.

I appreciate your time.

 

Ksharp
Super User
data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};
call missing(of result{*});
do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;
drop _: i j Code  Match1  Match2;
run;
ursula
Pyrite | Level 9

Hi Ksharp,

Thank you so much for the advance codes! 

 

after I run into my real data, I have a little question to make it perfect.

 

When I run using your codes, the ones that have two Match1-- see ID12, code1=A_A, the result should be "0" instead of Blank.

otherwise all are looks good.

But if ID12, code1 = F_F for example, then the result should be "Blank or missing", because F is not Matching with Match1 and Match2. Does it make sense?

 

could you help to fix the code?

 

Thanks again for your time.

 

 

this is what I want.

ID code1 Result1 code2 Result2 code3 Result3 code4 Result4
11 A_B 1 C_D 1 A_C 1 B_D 1
12 A_A 0 D_D 0 A_A 2 B_B 2
14 B_B 2 C_C 2 C_C 0 D_D 0
Ksharp
Super User

"

 because F is not Matching with Match1 and Match2. Does it make sense?

 "

 

But you said you want match MATCH2.

 

 

data work.file1;
 input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
14 F_F C_C C_C D_D 
;
run;
 
data work.File2;
   input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};

do k=1 to dim(result);
 result{k}=0;
end;

do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;
drop _: i j k Code  Match1  Match2;
run;
ursula
Pyrite | Level 9

Hi KShap,

 

Thanks so so much for your quick response!

I'm so happy almost there.

 

I'm sorry if I was confusing you 🙂

 

I Added one more line into File1.

 

data work.file1;
input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D
12 A_A D_D A_A B_B
14 B_B C_C C_C D_D
14 F_F C_C C_C D_D
15 F_F F_C C_F F_F
;
run;

data work.File2;
input Code $ Match1 $ Match2 $;
datalines;
code1 A B
code2 D C
code3 C A
code4 D B
;
run;

 

I want the result look like this:

for the "F" issue:

"Blank" if F_F -- NoMatch1 and NoMatch2.

see ID15 code3 -- C_F, the result is "Blank" because C is not a Match2, and F is not in File2(see Code3 -- Match2=A) so the result=,(blank)

see ID15 code2 -- F_C, the result is "1", because C is a Match2 even though it has "F" on it.

 

ID code1 code2 code3 code4 result1 result2 result3 result4
11 A_B C_D A_C B_D 1 1 1 1
12 A_A D_D A_A B_B 0 0 2 2
14 B_B C_C C_C D_D 2 2 0 0
14 F_F C_C C_C D_D . 2 0 0
15 F_F F_C C_F F_F . 1 . .

 

Thank you again for your time.

Happy Holidays!

Ksharp
Super User

OK. If I understood what you mean.

 

data work.file1;
input ID $ code1 $ code2 $ code3 $ code4 $;
datalines;
11 A_B C_D A_C B_D 
12 A_A D_D A_A B_B 
14 B_B C_C C_C D_D 
14 F_F C_C C_C D_D 
15 F_F F_C C_F F_F
;
run;

data work.File2;
input Code $ Match1 $ Match2 $;
datalines; 
code1 A B 
code2 D C 
code3 C A 
code4 D B 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set file2;
  declare hash h(dataset:'file2');
  h.definekey('code','match2');
  h.definedone();

  declare hash h1(dataset:'file2');
  h1.definekey('match1');
  h1.definedone();
  declare hash h2(dataset:'file2');
  h2.definekey('match2');
  h2.definedone();
 end;
set file1;
array x{*} code1-code4;
array result{4};

do k=1 to dim(result);
 result{k}=0;
end;

do i=1 to dim(x);
  _code=vname(x{i}); 
  do j=1 to countw(x{i},'_');
    _match2=scan(x{i},j,'_');
	if h.check(key:_code,key:_match2)=0 then result{i}+1;
  end;
end;

do i=1 to dim(x);
  if result{i}=0 then do; 
  do j=1 to countw(x{i},'_');
    temp=scan(x{i},j,'_');
	if h1.check(key:temp) ne 0 and h2.check(key:temp) ne 0 then result{i}=.;
  end;
  end;
end;
drop _: i j k Code  Match1  Match2 temp;
run;
ursula
Pyrite | Level 9

Thank you so much! Ksharp!!

 

This is exactly what i want!! I really appreciate your time and patience.

 

This community is very helpful.

 

Happy Holidays, KSharp!

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3547 views
  • 4 likes
  • 4 in conversation