look up solution help for row to column values or column to row matches for wide datasets?

Reply
Frequent Contributor
Posts: 137

look up solution help for row to column values or column to row matches for wide datasets?

Dear SAS Folks, I seek your help in solving my problem that requires a look up solution of row to column values or column to row for my given wide datasets. My datasets have values like the following:

Table A:

ID

Type1

type2

type3

type4

type5

type6

type7

type8

type9

type10

type11

type12

type13

type14

type15

ABC01

25

N

ABC01

25

N

A1

ABC01

11

Y

A5

ABC01

55

k

T1

JKL03

39

N

A5

JKL03

41

Y

A5

JKL03

40

N

T1

JKL03

39

Y

A1



Table B:

ID

Type1

type2

type3

type4

type5

type6

type7

type8

type9

type10

type11

type12

type13

type14

type15

ABC01

25,11,35,45

N,Y

T1,A1,A5

JKL03

39,40

N

T1,A1

Desired Results:

ID

Type1

type2

type3

type4

type5

type6

type7

type8

type9

type10

type11

type12

type13

type14

type15

ABC01

25

N

A1

ABC01

11

Y

A5

JKL03

40

N

T1

The logic for the look up is as follows:

  1. 1. Table A has so many by-groups with many observations as seen with couple of examples ABC01 and JKL03.
  2. 2. Table B is basically the data set that basically provides or defines information on what observations to be picked from Table A.
  3. 3. The look up has be to be done type to type, find a match and if present output that to the results dataset.
  4. 4. In the first row in table A: Type1 with valuye of 25 matches with one of the value being 25 in type1 in Table B for the same ID, so this is OK. Similarly,Type2 value N of table A matches with N of type2 in table B. However, Type3 value is missing in tableA which should actually have values T1,A1 or A5 as listed in the type3 of table B.
  5. 5. Therefore, desired results should write the observations where there is either or one match found as shown in results table. The same rules applies to all types for a complete look up.

Can somebody help me with this challenge? It seems too daunting.

Many thanks,

Charlotte

Super Contributor
Posts: 275

Re: look up solution help for row to column values or column to row matches for wide datasets?

Try with hash:

data want;

   if 0 then set B(rename=(type1=_type1 type2=_type2 type3=_type3));

   if _n_=1 then do;

      declare hash h(dataset:'B(rename=(type1=_type1 type2=_type2 type3=_type3)');

      h.definekey('ID');

      h.definedata(all:'y');

      h.definedone();

   end;

   set A;

   if h.find()=0 then do;

      if index(_type1,strip(type1))>0 and

         index(_type2,strip(type2))>0 and

         index(_type3,strip(type3))>0 then output;

   end;

   drop _:;

run;

Frequent Contributor
Posts: 137

Re: look up solution help for row to column values or column to row matches for wide datasets?

Thanks @slchenfish and @billfish  I really appreciate your time and your help. That really means a lot. Have a nice day

Frequent Contributor
Posts: 137

Re: look up solution help for row to column values or column to row matches for wide datasets?

Hi, the  code seems to fail when in an event of both _type3 and type3 have missing values. When ideally, the condition should equate to true because missing=missing is true. I'd appreciate your advice.

Contributor
Posts: 52

Re: look up solution help for row to column values or column to row matches for wide datasets?


A solution amongst others.

I will denote table A as t_a.
I will denote table B as t_b.
I will denote table Desired Results as t_results.

I assume that t_b has 1 row per id and t_a may have several rows per id.
I assume that all id's in t_b are also present in t_a.
I assume that t_a and t_b are already sorted by id;

/*************************************/
/**** randomized sample table t_a ****/
/*************************************/
data t_a(keep=id type1-type15);
   length id 8.;
   array type(15) $2.;
   do id=1 to 10;
      AA= 5+int(15*ranuni(7));
      do i = 1 to aa;
         do j=1 to 15;
            type(j)= put(ceil(3*ranuni(7)),$2.);
         end;
         output;
      end;
   end;
run;


/*************************************/
/**** randomized sample table t_b ****/
/*************************************/
data t_b(keep=id type1-type15);
   length id 8.;
   array type(15) $12.;
   do id=1 to 10;
      do j = 1 to 15;
         aa = ceil(6*ranuni(3));
         a1 = ceil(3*ranuni(3));
         if (aa in (3,4,5,6))   then type(j) = '1,2,3';
         if (aa = 2) and (a1=1) then type(j) = '2,3';
         if (aa = 2) and (a1=2) then type(j) = '1,3';
         if (aa = 2) and (a1=3) then type(j) = '1,2';
         if (aa = 1) then type(j)=put(a1, $2.);
      end;
      output;
   end;
run;


/*****************************************************/
/**** finding the records of t_a which match with ****/
/**** at least 1 element of all 15 type(*) of t_b ****/
/*****************************************************/
data t_results(keep=id type1-type15);
   length id 8.;
   array zTyp(15) $12.;
   array type(15)  $2.;

   do until(last.id);
      set t_b(rename=(type1-type15 = zTyp1-zTyp15));
      by id;
   end;

   do until(last.id);
      set t_a;
      by id;
      if first.id then do; zResult=0; end;
      do i = 1 to 15;
        z1 = sign(findw(zTyp(i),strip(type(i))));
        zResult+(z1);
      end;
      if zResult=15 then do; output; end;
      zResult=0;
   end;
run;


The resulting table t_results (these records came from t_a):

=================================================================================================================================
id   type1   type2   type3   type4   type5   type6   type7   type8   type9   type10   type11   type12   type13   type14   type15

1     2       2       2       1       1       3       1       3       1       1        1        1        2        1        3
1     2       1       3       2       1       2       3       3       1       3        1        3        2        2        3
1     1       1       3       3       1       3       1       1       2       3        1        2        1        1        1
4     1       2       2       3       2       1       2       2       1       2        2        3        2        2        3
=================================================================================================================================

Hope this helps.

Super User
Posts: 9,878

Re: look up solution help for row to column values or column to row matches for wide datasets?

Hi, Dear Charlotte :

I think @slchen make some sense . But you'd better use  indexw( , ',')   or   findw( ,',')  to avoid to unnecessary error.

Frequent Contributor
Posts: 137

Re: look up solution help for row to column values or column to row matches for wide datasets?

Dear Xia, Thank you as always. I was actually thinking of writing to you but I wasn't sure of what time would it be in Beijing and how busy you are during your day while I guess it would be early hours in the morning for me here in England. Right now, it is 4:10 pm here.

Should i simply replace Index with indexw. I am wondering about the ','? Can you please illustrate this line in the example provided by Slchen.

How are you enjoying summer?yYou should probably visit England Smiley Happy .

Many thanks my dear friend and well wisher.

Sincerely,

Charlotte

Super User
Posts: 9,878

Re: look up solution help for row to column values or column to row matches for wide datasets?

Dear Charlotte,

There is eight hours between you and me .

Nothing for me in this summary, just reading ,learning ......

I'd like to visit England ,like to see big bell,But I have no money . Hope you could visit China either .

Now come back to question .

I think @slchen 's code could be changed like this:

if ( index(_type1,strip(type1))>0  or (missing(_type1) and missing(type1))) and

        ( index(_type2,strip(type2))>0 or (missing(_type1) and missing(type1))) and

        ( index(_type3,strip(type3))>0 or (missing(_type1) and missing(type1))) then output;

Here is my code:

Code: Program

data A;
input ID $ / Type1 $ / type2 $ / type3 $;
cards;
ABC01
25
N
.
ABC01
25
N
A1
ABC01
11
Y
A5
ABC01
55
k
T1
JKL03
39
N
A5
JKL03
41
Y
A5
JKL03
40
N
T1
JKL03
39
Y
A1
;
run;

data B;
input ID $ / Type1 $20. / type2 $20. / type3 $20.;
cards;
ABC01
25,11,35,45
N,Y
T1,A1,A5
JKL03
39,40
N
T1,A1
;
run;
data want;
merge A B(rename=(type1-type3=_t1-_t3));
by id;
array x{*} $ type:;
array y{*} $ _t:;
matched=1;
do i=1 to dim(x);
   if not findw(y{i},strip(x{i})) then matched=0;
   if missing(x{i}) and missing(y{i}) then matched=1;
   if matched=0 then leave;
end;
if matched then output;
drop i matched _t:;
run;

Ask a Question
Discussion stats
  • 7 replies
  • 392 views
  • 6 likes
  • 4 in conversation