BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nicnad
Fluorite | Level 6

Hi,

I am new to SAS and SQL query so I am asking for your help to solve my problem.

I have two tables. Each tables have only alphanumerical data. The first table is missing data in some column. The other table is not missing any data. Lets says the tables are named Table1 and Table2.

They look like the following :

Table1

NameInfo1Info2Info3
Johna
Paulbbcc
Georgeaaaccc
Adamaaaabbbb
Briancccc

Table2

Info1Info2Info3Value1Value2Value3
abcdddef
aabbccjjkkll
aaabbbcccuiz
aaaabbbbccccnnbbaqvvvv

What I would like to do is do is to merge both table based on the info1 or info2 or info3 and return the name with value1, value2, value 3

I really don't know how to write this in SAS or SQL, but I am sure this could be done.

I see the solution as being nested IFs (or CASE) statement where the value1, value2 and value3 are returned if there is a matched based on the info from Table1.

IF info1 is not null; do the merge with info1 ELSE do the merge with info2

IF info2 is null ; do the merge info 3

The end result would be :

NameValue1Value2Value3
Johndddef
Pauljjkkll
Georgeuiz
Adamnnbbaqww
Briannnbbaqww

P.S. The data used is only there for example purpose, it the real data it is not sorted or have specific length in any way.

Could you please help me write the proper SAS procedure to create this table?

Thank you very much for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Replace the input...cards part with a simple 'set' statement:

data h1;

set yourtable1;

length index $10.;

array ind info1-info3;

do over ind;

  if not missing(ind) then do;

    index=cats(vname(ind),'_',ind);

    leave;

  end;

end;

;

data h2;

set yourtable2;

length index $50.;

array ind  info1-info3;

  do over ind;

     index=cats(vname(ind),'_',ind);

     output;

  end;

;

Haikuo

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

I know there are definitely slicker solutions out there. But if you need it quick, here is one approach hopefully to get you started. The main idea is to set up index for both data set, so they can merge accordingly.

data h1;

input (Name    Info1    Info2    Info3) (:$10.);

length index $10.;

array ind info1-info3;

do over ind;

  if not missing(ind) then do;

    index=cats(vname(ind),'_',ind);

    leave;

  end;

end;

cards;

John    a    . .   

Paul    .    bb    cc

George    aaa     .    ccc

Adam    aaaa     bbbb    .

Brian    . .        cccc

;

data h2;

input (Info1    Info2    Info3    Value1    Value2    Value3) (:$8.);

length index $50.;

array ind  info1-info3;

  do over ind;

     index=cats(vname(ind),'_',ind);

     output;

  end;

cards;

a    b    c    ddd    e    f

aa    bb    cc    jj    kk    ll

aaa    bbb    ccc    u    i    z

aaaa    bbbb    cccc    nn    bbaq    vvvv

;

proc sql;

select name, value1, value2,value3 from h1 a, h2 b

where a.index=b.index;

quit;

Haikuo

nicnad
Fluorite | Level 6

Thank you very much for the quick reply.

Like I said before, I am a complete beginner with SAS.

The thing is my two tables are already created and they both contain 2 Millons + rows and 20+ columns each . They are both SAS tables (.sas7bdat)

Is there a way to read the existing tables instead of using cards to declare the variables value (if I understand the code correctly)?

What I would like to do is the code to read the existing variables names (column headers) instead of having to input it myself, then having the program determine the length of the variable and then read each observations (rows) and assign it to an index.

Is this something that can be done in SAS language?

Would the solution be easier in another language? I am  using SAS Enterprise Guide 4.1 so any language supported by it would be fine.

Again, thank you for your help and time.

Haikuo
Onyx | Level 15

Replace the input...cards part with a simple 'set' statement:

data h1;

set yourtable1;

length index $10.;

array ind info1-info3;

do over ind;

  if not missing(ind) then do;

    index=cats(vname(ind),'_',ind);

    leave;

  end;

end;

;

data h2;

set yourtable2;

length index $50.;

array ind  info1-info3;

  do over ind;

     index=cats(vname(ind),'_',ind);

     output;

  end;

;

Haikuo

nicnad
Fluorite | Level 6

Exactly what I was looking for.

Thank you very much!!!!

ballardw
Super User

Are any values of Info1, 2 or 3 duplicated in either table?

Ksharp
Super User

If you care about speed, Hash table is a good choice.

data h1;
input (Name    Info1    Info2    Info3) (:$10.);
cards;
John    a    . .   
Paul    .    bb    cc
George    aaa     .    ccc
Adam    aaaa     bbbb    .
Brian    . .        cccc
;
run;

 

data h2;
input (Info1    Info2    Info3    Value1    Value2    Value3) (:$8.);
cards;
a    b    c    ddd    e    f
aa    bb    cc    jj    kk    ll
aaa    bbb    ccc    u    i    z
aaaa    bbbb    cccc    nn    bbaq    vvvv
;
run;

data want(keep=name value:);
length info1 - info3 $ 40;
 if _n_ eq 1 then do;
    if 0 then set h2;
    declare hash ha(hashexp:20);
     ha.definekey('key');
     ha.definedata('value1','value2','value3');
     ha.definedone();

     do until(last);
      set h2 end=last;
      if not missing(info1) then do;key=info1;ha.add(); end;
      if not missing(info2) then do;key=info2;ha.add(); end;
      if not missing(info3) then do;key=info3;ha.add(); end;
     end;
end;
call missing(of _all_);
set h1;
      if not missing(info1) then do;key=info1; rc=ha.find(); end;
      if not missing(info2) then do;key=info2; rc=ha.find(); end;
      if not missing(info3) then do;key=info3; rc=ha.find(); end;
run;



Ksharp

Haikuo
Onyx | Level 15

Thanks, Ksharp for bringing in the Hash() idea. Actually index can be inserted into hash() implementation to make it less hard coding:

data h1;

input (Name    Info1    Info2    Info3) (:$8.);

cards;

John    a    . . 

Paul    .    bb    cc

George    aaa     .    ccc

Adam    aaaa     bbbb    .

Brian    . .        cccc

;

data h2;

input (Info1    Info2    Info3    Value1    Value2    Value3) (:$8.);

cards;

a    b    c    ddd    e    f

aa    bb    cc    jj    kk    ll

aaa    bbb    ccc    u    i    z

aaaa    bbbb    cccc    nn    bbaq    vvvv

;

data want;

   dcl hash h(ordered:'a');

   h.definekey('var','var_name');

   h.definedata('var','var_name','value1','value2','value3');

   h.definedone();

        do until (last1);

         set h2 end=last1;

         array ind  info1-info3;

           do over ind;

             var=ind;

             var_name=vname(ind);

             h.add();

           end;

         end;

      

         do until (last2);

           set h1 end=last2;

           array ind1  info1-info3;

           do over ind1;

              if not missing(ind1) then do;

                rc=h.find(key:ind1,key:vname(ind1));  output;

              leave;

              end;

          end;

          end;

          keep name value1-value3;

          stop;

    run;

    proc print;run;

Haikuo

nicnad
Fluorite | Level 6

Thank you both for your suggestions!

I would like to add :

- Great forum

-Great Interface

-Great User base

-Fast Answer

Could not have asked for more.

Thank you all for your help and time!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6190 views
  • 0 likes
  • 4 in conversation