Solved: Re: Sorting Leaving, Entering and Staying observations from table T0 t...

x2PSx · Posted 04-27-2018 01:47 PM

Hello,

I have two big tables, table T0 and table T1 that represents observations for two different months. I want to create three new tables:

-one that has unique observations id for T0;

-one that has same observations id for T0 and T1;

-one that has unique observations id for T1.

Unique id is based on two columns: id1 & id2. I would prefer using proc sql for this as well. Is there a way for me to sort this without using left join and without merging id1 and id2 in a new variable?

Thank you very much 🙂

ballardw · Posted 04-27-2018 02:19 PM

@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

what is the concern over the left join? When it is appropriate tool use it.

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

View solution in original post

ballardw · Posted 04-27-2018 01:55 PM

This should select the sets of id variables you need.

Proc sql;
   create table uniquet0 as
   select * from (select distinct id1, id2 from t0)
   except
   select * from (select distinct id1, id2 from t1);

   create table uniquet1 as
   select * from (select distinct id1, id2 from t1)
   except
   select * from (select distinct id1, id2 from t0);

   create table common as
   select * from (select distinct id1, id2 from t1)
   intersect
   select * from (select distinct id1, id2 from t0);

quit;

Each of these would likely need to be joined (left or right likely) with the other data to bring in other variables.

x2PSx · Posted 04-27-2018 02:05 PM

Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

ballardw · Posted 04-27-2018 02:19 PM

@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

what is the concern over the left join? When it is appropriate tool use it.

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id columns

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id columns

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Click image to register for webinar

Classroom Training Available!