DATA Step, Macro, Functions and more

Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id columns

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id columns

Hello,

 

I have two big tables, table T0 and table T1 that represents observations for two different months. I want to create three new tables:

 

-one that has unique observations id for T0;

-one that has same observations id for T0 and T1;

-one that has unique observations id for T1.

 

Unique id is based on two columns: id1 & id2. I would prefer using proc sql for this as well. Is there a way for me to sort this without using left join and without merging id1 and id2 in a new variable?

 

Thank you very much Smiley Happy


Accepted Solutions
Solution
‎04-27-2018 02:23 PM
Super User
Posts: 13,950

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col


@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

 

what is the concern over the left join? When it is appropriate tool use it.

 

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

 

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may  be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

View solution in original post


All Replies
Super User
Posts: 13,950

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

This should select the sets of id variables you need.

Proc sql;
   create table uniquet0 as
   select * from (select distinct id1, id2 from t0)
   except
   select * from (select distinct id1, id2 from t1);

   create table uniquet1 as
   select * from (select distinct id1, id2 from t1)
   except
   select * from (select distinct id1, id2 from t0);

   create table common as
   select * from (select distinct id1, id2 from t1)
   intersect
   select * from (select distinct id1, id2 from t0);

quit;

Each of these would likely need to be joined (left or right likely) with the other data to bring in other variables.

 

Contributor
Posts: 28

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col

Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.
Solution
‎04-27-2018 02:23 PM
Super User
Posts: 13,950

Re: Sorting Leaving, Entering and Staying observations from table T0 to table T1 based on two id col


@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

 

what is the concern over the left join? When it is appropriate tool use it.

 

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

 

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may  be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 135 views
  • 0 likes
  • 2 in conversation