Hi All,
I have a number of datasets with common and unique variables. I want to create a master list of all the variables and add columns that indicate which datasets have which variables. I figured out a way to do this with proc sql and an outer join (see below) but it only really works with 2 tables. It appears a match-merge approach with a datastep is the way to go but I'm not sure how to add the checklist columns. Any suggestions?
Here is what I have so far:
data work.table1;
input name $ desc $;
datalines;
a descA
b descB
c descC
d descD
run;
data work.table2;
input name $ desc $;
datalines;
b descB
d descD
e descE
f descF
run;
data work.table3;
input name $ desc $;
datalines;
b descB
c descC
f descF
g descG
h descH
run;
* this produces what I want but only works for 2 tables;
* merge the files and add a checklist column for tables 1 and 2;
proc sql;
CREATE TABLE checkListSQL AS
SELECT
CASE
WHEN a.name = b.name THEN a.name
WHEN a.name = '' THEN b.name
WHEN b.name = '' THEN a.name
ELSE 'NULL'
END AS Variable length = 10,
CASE
WHEN a.name ~= '' then a.Desc
ELSE b.DESC
END as Description length = 50,
CASE
WHEN a.name = b.name THEN 'X'
WHEN a.name ~= '' THEN 'X'
ELSE ''
END as table1 length = 10,
CASE
WHEN b.name = a.name THEN 'X'
WHEN b.name ~= '' THEN 'X'
ELSE ''
END as table2 length =10
FROM table1 as a
FULL JOIN table2 as b
ON a.name = b.name;
quit;
data checkListMerge;
merge table1 table2 table3;
by name; *data already sorted;
*How to add the checklist portion via a data step?;
run;
Hi..
How About..
data checkListMerge;
merge table1(in=a) table2(in=b) table3(in=c);
by name; ;
if a then Table1='X';
if b then Table2='X';
if c then Table3='X';
run;
Regards.
Sanjeev.K
Hi..
How About..
data checkListMerge;
merge table1(in=a) table2(in=b) table3(in=c);
by name; ;
if a then Table1='X';
if b then Table2='X';
if c then Table3='X';
run;
Regards.
Sanjeev.K
That's almost too easy. Thanks!
I saw code similar to this while I was googling the problem but the syntax was so brief I couldn't understand the logic. Could I bother you for a layperson's explanation?
Hi..
For your understanding i modified the code..
data checkListMerge;
merge table1(in=a) table2(in=b) table3(in=c);
by name;
a1=a;
b1=b;
c1=c;
run;
suppose If the value of A1 variable is 1 ,it means BY variable value(in your code BY variable is Name) is Exist in Table1,if A1 value is 0 ,which means BY variable value not present in Table1.
For Example Name 'e' is present in only Table2.
So that is what for 'e' name,b1 value is 1 and A1,C1values are 0('e' not exist in Table1 and Table3)..
Hope You understand the logic.
Regards.
Sanjeev.K
Message was edited by: sajeev kuridi
Thanks again. I'll just add this text from a sas help page.
Understanding the IN= Data Set Option |
When you create a new data set by combining observations from two or more data sets, knowing which data set an observation came from can be useful. For example, you might want to perform a calculation based on which data set contributed an observation. Otherwise, you might lose important contextual information that you need for later processing. You can determine which data set contributed a particular observation by using the IN= data set option.
The IN= data set option enables you to determine which data sets have contributed to the observation that is currently in the program data vector. The syntax for this option on the SET statement is
SET SAS-data-set-1 (IN=variable) SAS-data-set-2; |
BY a-common-variable; |
When you use the IN= option with a data set in a SET, MERGE, MODIFY, or UPDATE statement, SAS creates a temporary variable associated with that data set. The value of variable is 1 if the data set has contributed to the observation currently in the program data vector. The value is 0 if it has not contributed. You can use the IN= option with any or all the data sets you name in a SET, MERGE, MODIFY, or UPDATE statement, but use a different variable name in each case.
Note: The IN= variable exists during the execution of the DATA step only; it is not written to the output data set that is created.
Yup..
You can also Perform All Types of joins in datastep with IN= options.
For more info..
http://www.nesug.org/Proceedings/nesug11/ds/ds03.pdf
Regards,
Sanjeev.K
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.