I have code using
order by
if not first. and last. then output unique
else output duplicate
I would like to do the same thing using sql but don't know how. I cannot use 'group by count > 1' because the dataset has a lot of variables and I only want to 'order by' and 'group by' two of the variables.
I'll attach the code if that helps, look at the bottom, that little 'group by' is what I'm trying, I know it won't work but that general idea is what I'd like. I need the second column, if duplicated within the first column, to spit out to another dataset, so I'll be forming two datasets instead of one. One of them will be 'having count>1', the other will be 'having count=1'.
Hopefully that makes sense.
Sorry I'm not following, but could you give an example with before and after data that you have and need?
It seems to me like you are saying you want all keys that have more than 1 record (by a set of unique keys) to be output into one dataset, and all keys that have only 1 record to be output into another dataset? Is this correct?
Aka if you have loan, date, balance, term, etc... and your original key is Loan + date, then you want all loans and dates that are dupliated (01, january) into one dataset with everything else into another?
Thanks for the clarification!
The attached is the SQL that doesn't work? It would be easier if you provided the SAS that does work and a dataset that shows what you have versus want.
the sql works, it's just that last line that I'm questioning, the group by
Sorry, the sql works without that last line, with the last line it won't work because I only want to group by two variables.
So why don't you create the file, and then group by the two variables to get your list, and then just re-join to your data as a subset if that's all you want... Or just do a group by 2 variables, you can group by as many varaibles as you like within sas, you don't have to group by everything in a select clause (strange because you do have too in sql if it's not in an aggregation clause).
I am still not understanding what you actually need. Again if you have some data, or the data step that works it would help a lot.
Based on your code I suspect you are trying to run this in a different SQL dialect than proc sql. Your use of the decode and lpad functions suggests that to me
If you are running in a different SQL dialect, 'group by' aggregation works a little differently. This has caused me confusion in the past.
Some other dialects of sql do not allow you to have any fields in the select clause except for the group by variables and aggregate functions. SAS Proc SQL lets you do this. So as examples, the following queries will work in SAS SQL but not some other dialects:
create table count as
select a
,b
,count(*) as cnt
from myTable
group by a ;
create table dups as
select *
from myTable
group by a
having count(*) > 1;
If there is no need to do in another SQL dialect then you can do this within SAS, but some of the functions will need to be converted.
If you need to do this in another SQL dialect, you will need to do some variation on AnotherDream's last suggestion. Two ways of doing this are
1. Join the original table with a sub-query of the original table with the group by in the sub-query
2. Create a temporary table from the group by and then join temporary table to original table.
Hope this helps.
If the database you are using supports the "Rank" OLAP functionality, this would allow you to control exactly how to order your desired results by an arbitrary number of columns. The OLAP functions use "Qualify" to include/reject rows.
An example from Teradata is below, and you will probably need to research the syntax from your database vendor.
QUALIFY
RANK() OVER
(
PARTITION BY
task_nbr, acct_nbr
ORDER BY
vldtn_diff_eff_day_to_min DESC /*ASC*/
) = 1
If I understand your question correctly, you want to use SQL to create two data sets at the same time. SQL (including SAS' PROC SQL) can only create one table at a time.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
