BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Escada
Obsidian | Level 7

Dear all,

 

I am strugling a bit dropping variables in a join statement: I need to drop some variables from t1 and other variables from t2 (the variables I need to drop are not keys). 

In the code below some variables are the same in both tables (because of the *) I cannot name manually each variable because there are hundreds of them and they can change.

Looking at the documentation it seems that the drop statement can be applied only in the result table (see drop statement after the CREATE).

Does anyone know how to differentiate variables from t1 and t2 in the DROP statement? (Maybe there is another solution?)

 

PROC SQL;
CREATE TABLE RESULT (DROP=MEASURE)  AS
SELECT t1.*,t2.*
FROM TABLE1 t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

 

In advance, thank you very much,

 

1 ACCEPTED SOLUTION

Accepted Solutions
RahulG
Barite | Level 11

Use drop= table option in From clause as shown below.  It would drop MEASURE column from TABLE1 and not from TABLE2.

 

 

PROC SQL;
CREATE TABLE RESULT   AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE)  t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

View solution in original post

3 REPLIES 3
RahulG
Barite | Level 11

Use drop= table option in From clause as shown below.  It would drop MEASURE column from TABLE1 and not from TABLE2.

 

 

PROC SQL;
CREATE TABLE RESULT   AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE)  t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, if your datasets have hundreds of variables - I would fits ask why, as that is never a good idea to have so many varibles - secondly I would consider using datastep rather than SQL.  SQL is built to handle relational database models, i.e. tables with few variables and many observations with links between tables.  SAS is more flexible than that allowing a more tabular form.  However for any kind of processing its really not a good idea to have hundreds of variables - at some point you will need to know what they are, either in this sql or later on - so that is the real underlying problem.  Now you could still do this with datastep as (not tested obviously):

/* Note assumes data is sorted*/
data want;
  merge table1 (in=a drop=somevariable where=(reported_period >= '31Dec2014'd) 
        table2 (in=b drop=measure);
  by entityid;
  if a;
run;
 

However I still strongly advise you to look at your data and strucutre it in a way that is useful for your programming to save further problems.

PGStats
Opal | Level 21

@Escada wrote:

I cannot name manually each variable because there are hundreds of them and they can change.


Given that same-name variables will create problems, how can you build a robust query with t1.*,t2.*? You must enumerate the set of variables that you need. Build the list in Excel or some text processor.

PG

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 66033 views
  • 5 likes
  • 4 in conversation