BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Escada
Obsidian | Level 7

Dear all,

 

I am strugling a bit dropping variables in a join statement: I need to drop some variables from t1 and other variables from t2 (the variables I need to drop are not keys). 

In the code below some variables are the same in both tables (because of the *) I cannot name manually each variable because there are hundreds of them and they can change.

Looking at the documentation it seems that the drop statement can be applied only in the result table (see drop statement after the CREATE).

Does anyone know how to differentiate variables from t1 and t2 in the DROP statement? (Maybe there is another solution?)

 

PROC SQL;
CREATE TABLE RESULT (DROP=MEASURE)  AS
SELECT t1.*,t2.*
FROM TABLE1 t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

 

In advance, thank you very much,

 

1 ACCEPTED SOLUTION

Accepted Solutions
RahulG
Barite | Level 11

Use drop= table option in From clause as shown below.  It would drop MEASURE column from TABLE1 and not from TABLE2.

 

 

PROC SQL;
CREATE TABLE RESULT   AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE)  t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

View solution in original post

3 REPLIES 3
RahulG
Barite | Level 11

Use drop= table option in From clause as shown below.  It would drop MEASURE column from TABLE1 and not from TABLE2.

 

 

PROC SQL;
CREATE TABLE RESULT   AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE)  t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, if your datasets have hundreds of variables - I would fits ask why, as that is never a good idea to have so many varibles - secondly I would consider using datastep rather than SQL.  SQL is built to handle relational database models, i.e. tables with few variables and many observations with links between tables.  SAS is more flexible than that allowing a more tabular form.  However for any kind of processing its really not a good idea to have hundreds of variables - at some point you will need to know what they are, either in this sql or later on - so that is the real underlying problem.  Now you could still do this with datastep as (not tested obviously):

/* Note assumes data is sorted*/
data want;
  merge table1 (in=a drop=somevariable where=(reported_period >= '31Dec2014'd) 
        table2 (in=b drop=measure);
  by entityid;
  if a;
run;
 

However I still strongly advise you to look at your data and strucutre it in a way that is useful for your programming to save further problems.

PGStats
Opal | Level 21

@Escada wrote:

I cannot name manually each variable because there are hundreds of them and they can change.


Given that same-name variables will create problems, how can you build a robust query with t1.*,t2.*? You must enumerate the set of variables that you need. Build the list in Excel or some text processor.

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 61377 views
  • 5 likes
  • 4 in conversation