Solved: Re: proc sql - help needed

alexdsa310 · Posted 04-20-2017 07:08 AM

i am using the below step to get obs with origin which are not in _assignments_ and only present in _values_ but not able to get.

proc sql;
create table sv1 as
select a.* from _values_ a left join _assignments_ b
on a.origin=b.origin and a.column=b.column;
quit;

I am not sure where i am wrong as i am using left join.

any help?

Kurt_Bremser · Posted 04-20-2017 07:17 AM

With a left join, you will always get all observations from the first table, and where a match is found in the other tables, their data will be added (left missing otherwise).

What you want might be achieved with

proc sql;
create table sv1 as
select a.* from _values_ a left join _assignments_ b
on a.origin=b.origin and a.column=b.column
where b.origin is missing;
quit;

or

data sv1;
merge
  _values (in=a)
  _assignments (in=b keep=origin column)
;
by origin column;
if a and not b;
run;

(proper sorting required)

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 04-20-2017 07:17 AM

With a left join, you will always get all observations from the first table, and where a match is found in the other tables, their data will be added (left missing otherwise).

What you want might be achieved with

proc sql;
create table sv1 as
select a.* from _values_ a left join _assignments_ b
on a.origin=b.origin and a.column=b.column
where b.origin is missing;
quit;

or

data sv1;
merge
  _values (in=a)
  _assignments (in=b keep=origin column)
;
by origin column;
if a and not b;
run;

(proper sorting required)

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alexdsa310 · Posted 04-20-2017 07:34 AM

Thanks Kurt.

just a small clarification

does

where b_origin is missing;

in proc sql gives origin which is missing ?

i need if origin and column is present in both then i do not need that observation

Kurt_Bremser · Posted 04-20-2017 07:45 AM

When you do a left join on a condition, and no match is found from the right, then the variables from the "right" table will all be set to missing. If a match is found, the variable(s) used in the condition can not be missing, by definition.

Edit: your post called my attention to a typo I made (corrected in the meantime). The condition should be

where b.origin is missing;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alexdsa310 · Posted 04-20-2017 07:54 AM

Thanks once again kurt.

For my final clarification i will get only observations which will not have a common origin and column observations in both the datasets(which is what i need). i.e if i have same origin and column name in both of my datasets then i will ommit that record and only take records which has _values_ where origin and column are different and not present in _assignments_.

Sorry for this. its just for my understanding

Kurt_Bremser · Posted 04-20-2017 08:04 AM

If in doubt, do a test run and look at the results. If puzzled, inquire further.

(Maxim 4 from the Maxims of Maximally Efficient SAS Programmers).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 04-20-2017 07:09 PM

@Kurt_Bremser wrote:

If in doubt, do a test run and look at the results. If puzzled, inquire further.

(Maxim 4 from the Maxims of Maximally Efficient SAS Programmers).

@Kurt_Bremser do you have more for that list? My google-fu only returns the Mercenary list (which I strongly suspect was an inspiration to your response).

And now I need some Ovalqwik!

Kurt_Bremser · Posted 04-21-2017 12:23 AM

@ballardw wrote:

@Kurt_Bremser wrote:

If in doubt, do a test run and look at the results. If puzzled, inquire further.

(Maxim 4 from the Maxims of Maximally Efficient SAS Programmers).

@Kurt_Bremser do you have more for that list? My google-fu only returns the Mercenary list (which I strongly suspect was an inspiration to your response).

And now I need some Ovalqwik!

I already have more for that list, and I plan to release it here as an ongoing project, as I expect that many would have valid maxims to contribute. And you are, of course, completely right where the inspiration came from. Tagon's Toughs to the rescue!

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 04-21-2017 03:05 AM

A first draft of the MoMESP has been saved at https://communities.sas.com/t5/SAS-Communities-Library/Maxims-of-Maximally-Efficient-SAS-Programmers...

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alexdsa310 · Posted 04-24-2017 07:16 AM

Hi Kurt,

I tried the same but still facing problems as the variable origin has multiple observations and due the NULL condition it is removing all and i only want to remove if it is same with origin and column and originwhere.

Attached is the sample datasets.

Can you guide me if possible

Kurt_Bremser · Posted 04-24-2017 07:57 AM

These are not SAS datasets, but Excel spreadsheets. Use the macro from https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert your example datasets into data steps for posting as text here.

Those data steps will preserve attributes (lengths, formats), and can be easily copied into SAS. MS Office files are often blocked by firewalls for security reasons.

Also post the code you used, and a quick representation of the expected result (no data step code needed there), as it differs from what you got with the code.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alexdsa310 · Posted 04-24-2017 08:22 AM

I am using the below code

proc sql;
create table sv1 as
select a.* from _values_ a left join _assignments_ b
on a.origin=b.origin and a.column=b.column
where b.origin is missing;
quit;

i need only observations which are in _values_ and not in _assignments_. so if i have a common observation in bothe the datasets with origin,originwhere i do not need them.

Currently it is not working as there are multiple observations of origin with diffirent column and originwhere with the code.

Kurt_Bremser · Posted 04-24-2017 08:59 AM

So you just need originwhere as an additional condition for the join?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alexdsa310 · Posted 04-24-2017 09:06 AM

Yes, i tried adding it in and a.originwhere=b.originwhere but i get few observations which are in _assignments_ coming in my dataset

Kurt_Bremser · Posted 04-24-2017 09:29 AM

Which of the 230 observations you get with:

proc sql;
create table sv1 as
select a.* from _values_ a left join _assignments_ b
on a.origin=b.origin and a.column=b.column and a.originwhere=b.originwhere
where b.origin is missing;
quit;

should not be in the result dataset (just provide observation numbers)?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away