Solved: Re: PROC SQL - DROP statement in a join - Discrimination between t1 an...

Escada · Posted 06-13-2016 04:10 AM

Dear all,

I am strugling a bit dropping variables in a join statement: I need to drop some variables from t1 and other variables from t2 (the variables I need to drop are not keys).

In the code below some variables are the same in both tables (because of the *) I cannot name manually each variable because there are hundreds of them and they can change.

Looking at the documentation it seems that the drop statement can be applied only in the result table (see drop statement after the CREATE).

Does anyone know how to differentiate variables from t1 and t2 in the DROP statement? (Maybe there is another solution?)

PROC SQL;
CREATE TABLE RESULT (DROP=MEASURE) AS
SELECT t1.*,t2.*
FROM TABLE1 t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

In advance, thank you very much,

RahulG · Posted 06-13-2016 04:33 AM

Use drop= table option in From clause as shown below. It would drop MEASURE column from TABLE1 and not from TABLE2.

PROC SQL;
CREATE TABLE RESULT AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE) t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

View solution in original post

RahulG · Posted 06-13-2016 04:33 AM

Use drop= table option in From clause as shown below. It would drop MEASURE column from TABLE1 and not from TABLE2.

PROC SQL;
CREATE TABLE RESULT AS
SELECT t1.*,t2.*
FROM TABLE1(DROP=MEASURE) t1
LEFT JOIN TABLE2 t2 ON(t1.Entity_ID = t2.ENTITY_ID) AND (t1.Reported_Period = t2.REPORTED_PERIOD)
WHERE t1.Reported_Period >= '31Dec2014'd;
QUIT;

RW9 · Posted 06-13-2016 04:43 AM

Well, if your datasets have hundreds of variables - I would fits ask why, as that is never a good idea to have so many varibles - secondly I would consider using datastep rather than SQL. SQL is built to handle relational database models, i.e. tables with few variables and many observations with links between tables. SAS is more flexible than that allowing a more tabular form. However for any kind of processing its really not a good idea to have hundreds of variables - at some point you will need to know what they are, either in this sql or later on - so that is the real underlying problem. Now you could still do this with datastep as (not tested obviously):

/* Note assumes data is sorted*/
data want;
  merge table1 (in=a drop=somevariable where=(reported_period >= '31Dec2014'd) 
        table2 (in=b drop=measure);
  by entityid;
  if a;
run;

However I still strongly advise you to look at your data and strucutre it in a way that is useful for your programming to save further problems.

PGStats · Posted 06-13-2016 02:28 PM

@Escada wrote:

I cannot name manually each variable because there are hundreds of them and they can change.

Given that same-name variables will create problems, how can you build a robust query with t1.*,t2.*? You must enumerate the set of variables that you need. Build the list in Excel or some text processor.

PG

PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Re: PROC SQL - DROP statement in a join - Discrimination between t1 and t2

Registration is open

SAS Training: Just a Click Away