Hello,
I have two big tables, table T0 and table T1 that represents observations for two different months. I want to create three new tables:
-one that has unique observations id for T0;
-one that has same observations id for T0 and T1;
-one that has unique observations id for T1.
Unique id is based on two columns: id1 & id2. I would prefer using proc sql for this as well. Is there a way for me to sort this without using left join and without merging id1 and id2 in a new variable?
Thank you very much 🙂
@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.
Without data and knowing what comes from each set then it is very hard to tell what might be needed.
Do note that the core of any of those queries could be used such as:
Proc sql; create table wat as selct b.* from ( select * from (select distinct id1, id2 from t0) except select * from (select distinct id1, id2 from t1) ) as a left join t0 as b on a.id1=b.id1 and a.id2=b.id2 ; quit;
But if you need variables from both t0 and t1 then there will be more code involved.
what is the concern over the left join? When it is appropriate tool use it.
You might provide some short examples of the two data sets and the desired final output.
I can see a possibility of a data step but that would require sorting both data sets.
"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.
This should select the sets of id variables you need.
Proc sql; create table uniquet0 as select * from (select distinct id1, id2 from t0) except select * from (select distinct id1, id2 from t1); create table uniquet1 as select * from (select distinct id1, id2 from t1) except select * from (select distinct id1, id2 from t0); create table common as select * from (select distinct id1, id2 from t1) intersect select * from (select distinct id1, id2 from t0); quit;
Each of these would likely need to be joined (left or right likely) with the other data to bring in other variables.
@x2PSx wrote:
Thank you, yes this works to get the sets of id that I need . I was trying to avoid doing it in two steps and Left joining afterwards. Is there a way to do this more efficiently in one step? Thank you very much for your time.
Without data and knowing what comes from each set then it is very hard to tell what might be needed.
Do note that the core of any of those queries could be used such as:
Proc sql; create table wat as selct b.* from ( select * from (select distinct id1, id2 from t0) except select * from (select distinct id1, id2 from t1) ) as a left join t0 as b on a.id1=b.id1 and a.id2=b.id2 ; quit;
But if you need variables from both t0 and t1 then there will be more code involved.
what is the concern over the left join? When it is appropriate tool use it.
You might provide some short examples of the two data sets and the desired final output.
I can see a possibility of a data step but that would require sorting both data sets.
"Efficiency" comes in a number flavors: easy to write/maintain code, disk storage space, cpu run times are a few for example. Single complex queries may have fewer typed characters but may be harder write initially (hence your question) and maintain when changes occur. And the more complex a single query gets the longer it is likely to run.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.