07-17-2013 11:38 AM
I have code using
if not first. and last. then output unique
else output duplicate
I would like to do the same thing using sql but don't know how. I cannot use 'group by count > 1' because the dataset has a lot of variables and I only want to 'order by' and 'group by' two of the variables.
I'll attach the code if that helps, look at the bottom, that little 'group by' is what I'm trying, I know it won't work but that general idea is what I'd like. I need the second column, if duplicated within the first column, to spit out to another dataset, so I'll be forming two datasets instead of one. One of them will be 'having count>1', the other will be 'having count=1'.
Hopefully that makes sense.
07-17-2013 11:58 AM
Sorry I'm not following, but could you give an example with before and after data that you have and need?
It seems to me like you are saying you want all keys that have more than 1 record (by a set of unique keys) to be output into one dataset, and all keys that have only 1 record to be output into another dataset? Is this correct?
Aka if you have loan, date, balance, term, etc... and your original key is Loan + date, then you want all loans and dates that are dupliated (01, january) into one dataset with everything else into another?
Thanks for the clarification!
07-17-2013 04:19 PM
So why don't you create the file, and then group by the two variables to get your list, and then just re-join to your data as a subset if that's all you want... Or just do a group by 2 variables, you can group by as many varaibles as you like within sas, you don't have to group by everything in a select clause (strange because you do have too in sql if it's not in an aggregation clause).
I am still not understanding what you actually need. Again if you have some data, or the data step that works it would help a lot.
07-18-2013 10:35 AM
Based on your code I suspect you are trying to run this in a different SQL dialect than proc sql. Your use of the decode and lpad functions suggests that to me
If you are running in a different SQL dialect, 'group by' aggregation works a little differently. This has caused me confusion in the past.
Some other dialects of sql do not allow you to have any fields in the select clause except for the group by variables and aggregate functions. SAS Proc SQL lets you do this. So as examples, the following queries will work in SAS SQL but not some other dialects:
create table count as
,count(*) as cnt
group by a ;
create table dups as
group by a
having count(*) > 1;
If there is no need to do in another SQL dialect then you can do this within SAS, but some of the functions will need to be converted.
If you need to do this in another SQL dialect, you will need to do some variation on AnotherDream's last suggestion. Two ways of doing this are
1. Join the original table with a sub-query of the original table with the group by in the sub-query
2. Create a temporary table from the group by and then join temporary table to original table.
Hope this helps.
07-18-2013 02:28 PM
If the database you are using supports the "Rank" OLAP functionality, this would allow you to control exactly how to order your desired results by an arbitrary number of columns. The OLAP functions use "Qualify" to include/reject rows.
An example from Teradata is below, and you will probably need to research the syntax from your database vendor.
vldtn_diff_eff_day_to_min DESC /*ASC*/
) = 1