BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
x2PSx
Calcite | Level 5

Hello,

 

I have two big tables, table T0 and table T1 that represents observations for two different months. I want to create three new tables:

 

-one that has unique observations id for T0;

-one that has same observations id for T0 and T1;

-one that has unique observations id for T1.

 

Unique id is based on two columns: id1 & id2. I would prefer using proc sql for this as well. Is there a way for me to sort this without using left join and without merging id1 and id2 in a new variable?

 

Thank you very much 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

 

what is the concern over the left join? When it is appropriate tool use it.

 

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

 

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may  be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

View solution in original post

3 REPLIES 3
ballardw
Super User

This should select the sets of id variables you need.

Proc sql;
   create table uniquet0 as
   select * from (select distinct id1, id2 from t0)
   except
   select * from (select distinct id1, id2 from t1);

   create table uniquet1 as
   select * from (select distinct id1, id2 from t1)
   except
   select * from (select distinct id1, id2 from t0);

   create table common as
   select * from (select distinct id1, id2 from t1)
   intersect
   select * from (select distinct id1, id2 from t0);

quit;

Each of these would likely need to be joined (left or right likely) with the other data to bring in other variables.

 

x2PSx
Calcite | Level 5
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.
ballardw
Super User

@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.

Without data and knowing what comes from each set then it is very hard to tell what might be needed.

Do note that the core of any of those queries could be used such as:

Proc sql;
   create table wat as
   selct b.* 
   from (
      select * from (select distinct id1, id2 from t0)
      except
      select * from (select distinct id1, id2 from t1)
      ) as a
      left join t0 as b
      on a.id1=b.id1 and a.id2=b.id2
   ;

quit;

But if you need variables from both t0 and t1 then there will be more code involved.

 

what is the concern over the left join? When it is appropriate tool use it.

 

You might provide some short examples of the two data sets and the desired final output.

I can see a possibility of a data step but that would require sorting both data sets.

 

"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may  be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 827 views
  • 0 likes
  • 2 in conversation