BookmarkSubscribeRSS Feed
Steelers_In_DC
Barite | Level 11

I have code using

order by

if not first. and last. then output unique

else output duplicate

I would like to do the same thing using sql but don't know how.  I cannot use 'group by count > 1' because the dataset has a lot of variables and I only want to 'order by' and 'group by' two of the variables.

I'll attach the code if that helps, look at the bottom, that little 'group by' is what I'm trying, I know it won't work but that general idea is what I'd like.  I need the second column, if duplicated within the first column,  to spit out to another dataset, so I'll be forming two datasets instead of one.  One of them will be 'having count>1', the other will be 'having count=1'.

Hopefully that makes sense.

8 REPLIES 8
Anotherdream
Quartz | Level 8

Sorry I'm not following, but could you give an example with before and after data that you have and need?

It seems to me like you are saying you want all keys that have more than 1 record (by a set of unique keys) to be output into one dataset, and all keys that have only 1 record to be output into another dataset? Is this correct?

Aka if you have loan, date, balance, term, etc... and your original key is Loan + date, then you want all loans and dates that are dupliated (01, january) into one dataset with everything else into another?

Thanks for the clarification!

Reeza
Super User

The attached is the SQL that doesn't work? It would be easier if you provided the SAS that does work and a dataset that shows what you have versus want.

Steelers_In_DC
Barite | Level 11

the sql works, it's just that last line that I'm questioning, the group by

Steelers_In_DC
Barite | Level 11

Sorry, the sql works without that last line, with the last line it won't work because I only want to group by two variables.

Anotherdream
Quartz | Level 8

So why don't you create the file, and then group by the two variables to get your list, and then just re-join to your data as a subset if that's all you want... Or just do a group by 2 variables, you can group by as many varaibles as you like within sas, you don't have to group by everything in a select clause (strange because you do have too in sql if it's not in an aggregation clause).

I am still not understanding what you actually need. Again if you have some data, or the data step that works it would help a lot.

LarryWorley
Fluorite | Level 6

Based on your code I suspect you are trying to run this in a different SQL dialect than proc sql.  Your use of the decode and lpad functions suggests that to me

If you are running in a different SQL dialect, 'group by' aggregation works a little differently.  This has caused me confusion in the past.

Some other dialects of sql do not allow you to have any fields in the select clause except for the group by variables and aggregate functions.  SAS Proc SQL lets you do this.  So as  examples, the following queries will work in SAS SQL but not some other dialects:

create table count as

select a

         ,b

        ,count(*) as cnt

  from myTable

group by a ;

create table dups as

select *

  from myTable

group by a

having count(*) > 1;

If there is no need to do in another SQL dialect then you can do this within SAS, but some of the functions will need to be converted.

If you need to do this in another SQL dialect, you will need to do some variation on AnotherDream's last suggestion.  Two ways of doing this are

1.  Join the original table with a sub-query of the original table with the group by in the sub-query

2. Create a temporary table from the group by and then join temporary table to original table.

Hope this helps.

cal4gORl3ndU
Calcite | Level 5

If the database you are using supports the "Rank" OLAP functionality, this would allow you to control exactly how to order your desired results by an arbitrary number of columns.  The OLAP functions use "Qualify" to include/reject rows.

An example from Teradata is below, and you will probably need to research the syntax from your database vendor.

QUALIFY

     RANK() OVER

     (

     PARTITION BY

          task_nbr, acct_nbr

     ORDER BY

          vldtn_diff_eff_day_to_min DESC /*ASC*/

     ) = 1

Fugue
Quartz | Level 8

If I understand your question correctly, you want to use SQL to create two data sets at the same time. SQL (including SAS' PROC SQL) can only create one table at a time.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1336 views
  • 0 likes
  • 6 in conversation