I'm looking to create a table through proc sql that takes a larger list with multiple entries for a primary key, and want to return rows with that have a primary key where none of a given criteria occur.
Imagine this list below as my original list called work.names . If a name has a non-null login_id value for any of its rows, I don't want any row with that name returned at all. I only want names where all rows have a NULL login_id, ideally with only one entry for each name/primary key.
Name | Login_ID |
Alejandra Bravo | 1394 |
Alejandra Bravo | NULL |
Vincent Crisanti | 6687 |
Vincent Crisanti | NULL |
Stephen Holyday | NULL |
Stephen Holyday | 5548 |
Stephen Holyday | NULL |
Lily Cheng | 3392 |
Amber Morley | NULL |
Amber Morley | NULL |
Amber Morley | NULL |
Frances Nunziata | NULL |
Frances Nunziata | NULL |
And from this list should return something like:
Name | Login_ID_3_1 |
Amber Morley | NULL |
Frances Nunziata | NULL |
My initial idea was to use this code:
select distinct *,
rank () over(partition by name order by LOGIN_ID) as RANK
from work.names
where RANK = 1 and missing(LOGIN_ID)
However, this does not work because rank over partition is not supported in proq sql. So perhaps there's another route to getting the data i need? It doesn't need to be a way to rank data, I just need to get rows with names that have no non-null values. Any ideas?
This should work:
proc sql;
create table want as select distinct *
from have
group by name
having sum(login_id^='NULL')=0;
quit;
From now on, we cannot work with data in screen captures or file attachments. We need data presented as WORKING data step code which you can type in yourself, or follow these instructions.
This should work:
proc sql;
create table want as select distinct *
from have
group by name
having sum(login_id^='NULL')=0;
quit;
From now on, we cannot work with data in screen captures or file attachments. We need data presented as WORKING data step code which you can type in yourself, or follow these instructions.
My mistake. I did not include that the login column is a varchar column, not a numeric one. It contained numeric values in the sample I gave by coincidence, as there are some values that have other characters, so I don't believe a grouped sum would work here, would it?
@aazzarello wrote:
My mistake. I did not include that the login column is a varchar column, not a numeric one. It contained numeric values in the sample I gave by coincidence, as there are some values that have other characters, so I don't believe a grouped sum would work here, would it?
The sum in the suggested solution is totaling the results of the comparison: (login_id ^='NULL'). SAS returns a 1 for true and 0 for false.
So the value of the variable isn't summed, the value of the comparison result is summed and does not matter as to the variable type as long as the comparison is valid for the variable.
Did you even try the code? It recognizes that LOGIN_ID is character. I would explain, except @ballardw already has explained.
I would try to do it something like this:
Proc SQL;
select distinct name,login_id as login_id_3_1
from work.names base
where not exists(select * from work.names test
where name=base.name and login_id is not null);
Just one question: what do you need the LOGIN_ID for, if you already know that is always NULL?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.