Hi,
I am new to SAS and SQL query so I am asking for your help to solve my problem.
I have two tables. Each tables have only alphanumerical data. The first table is missing data in some column. The other table is not missing any data. Lets says the tables are named Table1 and Table2.
They look like the following :
Table1
Name | Info1 | Info2 | Info3 |
---|---|---|---|
John | a | ||
Paul | bb | cc | |
George | aaa | ccc | |
Adam | aaaa | bbbb | |
Brian | cccc |
Table2
Info1 | Info2 | Info3 | Value1 | Value2 | Value3 |
---|---|---|---|---|---|
a | b | c | ddd | e | f |
aa | bb | cc | jj | kk | ll |
aaa | bbb | ccc | u | i | z |
aaaa | bbbb | cccc | nn | bbaq | vvvv |
What I would like to do is do is to merge both table based on the info1 or info2 or info3 and return the name with value1, value2, value 3
I really don't know how to write this in SAS or SQL, but I am sure this could be done.
I see the solution as being nested IFs (or CASE) statement where the value1, value2 and value3 are returned if there is a matched based on the info from Table1.
IF info1 is not null; do the merge with info1 ELSE do the merge with info2
IF info2 is null ; do the merge info 3
The end result would be :
Name | Value1 | Value2 | Value3 |
---|---|---|---|
John | ddd | e | f |
Paul | jj | kk | ll |
George | u | i | z |
Adam | nn | bbaq | ww |
Brian | nn | bbaq | ww |
P.S. The data used is only there for example purpose, it the real data it is not sorted or have specific length in any way.
Could you please help me write the proper SAS procedure to create this table?
Thank you very much for your help.
Replace the input...cards part with a simple 'set' statement:
data h1;
set yourtable1;
length index $10.;
array ind info1-info3;
do over ind;
if not missing(ind) then do;
index=cats(vname(ind),'_',ind);
leave;
end;
end;
;
data h2;
set yourtable2;
length index $50.;
array ind info1-info3;
do over ind;
index=cats(vname(ind),'_',ind);
output;
end;
;
Haikuo
I know there are definitely slicker solutions out there. But if you need it quick, here is one approach hopefully to get you started. The main idea is to set up index for both data set, so they can merge accordingly.
data h1;
input (Name Info1 Info2 Info3) (:$10.);
length index $10.;
array ind info1-info3;
do over ind;
if not missing(ind) then do;
index=cats(vname(ind),'_',ind);
leave;
end;
end;
cards;
John a . .
Paul . bb cc
George aaa . ccc
Adam aaaa bbbb .
Brian . . cccc
;
data h2;
input (Info1 Info2 Info3 Value1 Value2 Value3) (:$8.);
length index $50.;
array ind info1-info3;
do over ind;
index=cats(vname(ind),'_',ind);
output;
end;
cards;
a b c ddd e f
aa bb cc jj kk ll
aaa bbb ccc u i z
aaaa bbbb cccc nn bbaq vvvv
;
proc sql;
select name, value1, value2,value3 from h1 a, h2 b
where a.index=b.index;
quit;
Haikuo
Thank you very much for the quick reply.
Like I said before, I am a complete beginner with SAS.
The thing is my two tables are already created and they both contain 2 Millons + rows and 20+ columns each . They are both SAS tables (.sas7bdat)
Is there a way to read the existing tables instead of using cards to declare the variables value (if I understand the code correctly)?
What I would like to do is the code to read the existing variables names (column headers) instead of having to input it myself, then having the program determine the length of the variable and then read each observations (rows) and assign it to an index.
Is this something that can be done in SAS language?
Would the solution be easier in another language? I am using SAS Enterprise Guide 4.1 so any language supported by it would be fine.
Again, thank you for your help and time.
Replace the input...cards part with a simple 'set' statement:
data h1;
set yourtable1;
length index $10.;
array ind info1-info3;
do over ind;
if not missing(ind) then do;
index=cats(vname(ind),'_',ind);
leave;
end;
end;
;
data h2;
set yourtable2;
length index $50.;
array ind info1-info3;
do over ind;
index=cats(vname(ind),'_',ind);
output;
end;
;
Haikuo
Exactly what I was looking for.
Thank you very much!!!!
Are any values of Info1, 2 or 3 duplicated in either table?
If you care about speed, Hash table is a good choice.
data h1; input (Name Info1 Info2 Info3) (:$10.); cards; John a . . Paul . bb cc George aaa . ccc Adam aaaa bbbb . Brian . . cccc ; run; data h2; input (Info1 Info2 Info3 Value1 Value2 Value3) (:$8.); cards; a b c ddd e f aa bb cc jj kk ll aaa bbb ccc u i z aaaa bbbb cccc nn bbaq vvvv ; run; data want(keep=name value:); length info1 - info3 $ 40; if _n_ eq 1 then do; if 0 then set h2; declare hash ha(hashexp:20); ha.definekey('key'); ha.definedata('value1','value2','value3'); ha.definedone(); do until(last); set h2 end=last; if not missing(info1) then do;key=info1;ha.add(); end; if not missing(info2) then do;key=info2;ha.add(); end; if not missing(info3) then do;key=info3;ha.add(); end; end; end; call missing(of _all_); set h1; if not missing(info1) then do;key=info1; rc=ha.find(); end; if not missing(info2) then do;key=info2; rc=ha.find(); end; if not missing(info3) then do;key=info3; rc=ha.find(); end; run;
Ksharp
Thanks, Ksharp for bringing in the Hash() idea. Actually index can be inserted into hash() implementation to make it less hard coding:
data h1;
input (Name Info1 Info2 Info3) (:$8.);
cards;
John a . .
Paul . bb cc
George aaa . ccc
Adam aaaa bbbb .
Brian . . cccc
;
data h2;
input (Info1 Info2 Info3 Value1 Value2 Value3) (:$8.);
cards;
a b c ddd e f
aa bb cc jj kk ll
aaa bbb ccc u i z
aaaa bbbb cccc nn bbaq vvvv
;
data want;
dcl hash h(ordered:'a');
h.definekey('var','var_name');
h.definedata('var','var_name','value1','value2','value3');
h.definedone();
do until (last1);
set h2 end=last1;
array ind info1-info3;
do over ind;
var=ind;
var_name=vname(ind);
h.add();
end;
end;
do until (last2);
set h1 end=last2;
array ind1 info1-info3;
do over ind1;
if not missing(ind1) then do;
rc=h.find(key:ind1,key:vname(ind1)); output;
leave;
end;
end;
end;
keep name value1-value3;
stop;
run;
proc print;run;
Haikuo
Thank you both for your suggestions!
I would like to add :
- Great forum
-Great Interface
-Great User base
-Fast Answer
Could not have asked for more.
Thank you all for your help and time!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.