BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Hi All,

I have a number of datasets with common and unique variables. I want to create a master list of all the variables and add columns that indicate which datasets have which variables. I figured out a way to do this with proc sql and an outer join (see below) but it only really works with 2 tables. It appears a match-merge approach with a datastep is the way to go but I'm not sure how to add the checklist columns. Any suggestions?

Here is what I have so far:

data work.table1;

    input name $ desc $;

    datalines;

a descA

b descB

c descC

d descD

run;

data work.table2;

    input name $ desc $;

    datalines;

b descB

d descD

e descE

f descF

run;

data work.table3;

    input name $ desc $;

    datalines;

b descB

c descC

f descF

g descG

h descH

run;

* this produces what I want but only works for 2 tables;

* merge the files and add a checklist column for tables 1 and 2;

proc sql;

    CREATE TABLE checkListSQL AS

    SELECT

        CASE

            WHEN a.name = b.name THEN a.name

            WHEN a.name = '' THEN b.name

            WHEN b.name = '' THEN a.name

            ELSE 'NULL'

            END AS Variable length = 10,

        CASE

            WHEN a.name ~= '' then a.Desc

            ELSE b.DESC

            END as Description length = 50,

        CASE

            WHEN a.name = b.name THEN 'X'

            WHEN a.name ~= '' THEN 'X'

            ELSE ''

            END as table1 length = 10,

        CASE

            WHEN b.name = a.name THEN 'X'

            WHEN b.name ~= '' THEN 'X'

            ELSE ''

            END as table2 length =10

    FROM table1 as a

    FULL JOIN table2 as b

    ON a.name = b.name;

quit;

data checkListMerge;

    merge table1 table2 table3;

    by name; *data already sorted;

    *How to add the checklist portion via a data step?;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
kuridisanjeev
Quartz | Level 8

Hi..

How About..

data checkListMerge;

    merge table1(in=a) table2(in=b) table3(in=c);

    by name; ;

  if a then Table1='X'; 

    if b then Table2='X'; 

   if c then Table3='X'; 

run;

Regards.

Sanjeev.K

View solution in original post

6 REPLIES 6
kuridisanjeev
Quartz | Level 8

Hi..

How About..

data checkListMerge;

    merge table1(in=a) table2(in=b) table3(in=c);

    by name; ;

  if a then Table1='X'; 

    if b then Table2='X'; 

   if c then Table3='X'; 

run;

Regards.

Sanjeev.K

kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

That's almost too easy. Thanks!

I saw code similar to this while I was googling the problem but the syntax was so brief I couldn't understand the logic. Could I bother you for a layperson's explanation?

kuridisanjeev
Quartz | Level 8

Hi..

For your understanding i modified the code..


data checkListMerge;

    merge table1(in=a) table2(in=b) table3(in=c);

    by name;

  a1=a;

  b1=b;

  c1=c;

run;

   

suppose If the value of A1 variable is 1 ,it means BY variable value(in your code BY variable is Name) is Exist in Table1,if  A1 value is 0 ,which means BY variable value not present in Table1.


For Example Name 'e' is present in only Table2.

So that is what for  'e' name,b1 value is 1 and A1,C1values are 0('e' not exist in Table1 and Table3)..



Hope You understand the logic.


Regards.


Sanjeev.K

Message was edited by: sajeev kuridi

kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Thanks again. I'll just add this text from a sas help page.


Understanding the IN= Data Set Option

When you create a new data set by combining observations from two or more data sets, knowing which data set an observation came from can be useful. For example, you might want to perform a calculation based on which data set contributed an observation. Otherwise, you might lose important contextual information that you need for later processing. You can determine which data set contributed a particular observation by using the IN= data set option.

The IN= data set option enables you to determine which data sets have contributed to the observation that is currently in the program data vector. The syntax for this option on the SET statement is

SET SAS-data-set-1 (IN=variable) SAS-data-set-2;
BY a-common-variable;

When you use the IN= option with a data set in a SET, MERGE, MODIFY, or UPDATE statement, SAS creates a temporary variable associated with that data set. The value of variable is 1 if the data set has contributed to the observation currently in the program data vector. The value is 0 if it has not contributed. You can use the IN= option with any or all the data sets you name in a SET, MERGE, MODIFY, or UPDATE statement, but use a different variable name in each case.

Note:   The IN= variable exists during the execution of the DATA step only; it is not written to the output data set that is created.

kuridisanjeev
Quartz | Level 8

Yup..

You can also Perform All Types of joins in datastep with IN= options.

For more info..

http://www.nesug.org/Proceedings/nesug11/ds/ds03.pdf

Regards,

Sanjeev.K

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 966 views
  • 4 likes
  • 2 in conversation