DATA Step, Macro, Functions and more

How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 110
Accepted Solution

How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

I know how to get around this issue by writing an additional code, but I would like to know whether I can select distinct observations based on the two variables "cusip" and "fyear" but still keep the variable "shrout" in the final dataset by the code below. The dataset work.outstanding_shares_1 contains shrout, cusip and fyear. 

 

 

 

proc sql;
	create table work.outstanding_shares_2 as
	select distinct cusip, fyear 
	from work.outstanding_shares_1;
quit;


Thanks for any suggestions. 

 


Accepted Solutions
Solution
‎06-01-2017 04:56 PM
Respected Advisor
Posts: 3,799

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

[ Edited ]

Maybe this is what you want.

 

data work.outstanding_shares_1;
infile cards expandtabs truncover;
input cusip	fyear shrout date : yymmdd10.;
format date yymmdd10.;
cards;
20910	1992	9284	19920831
20910	1992	9362	19921030
20910	1988	9002	19880331
20910	1988	9037	19881031
20910	1988	9023	19880729
36020	2006	12342	20061130
36020	2003	12541	20030930
36020	2006	12336	20060630
36020	2003	13031	20030131
36020	2003	12813	20031031
;
run;
proc print;
   run;
proc summary nway;
   class cusip fyear;
   output out=shrout(drop=_:)
      idgroup(max(date) out(shrout)=);
   run;
proc print;
   run;

2017-06-01_15-49-22.png 

View solution in original post


All Replies
PROC Star
Posts: 7,487

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

It would help if you can post example have and want datasets. On first glance it seems like you should be able to get what you want by simply including shrout in your select statement. If you are looking to have single records for each "cusip" and "fyear" combination, but with multiple values of "shrout", I'd still do the same, but then apply a proc transpose to create the wide records.

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

[ Edited ]

Hi @art297, thanks for your reply.

Here is a sample of the data:

 

data work.outstanding_shares_1;
infile cards expandtabs truncover;
input cusip	fyear shrout date : yymmdd10.;
format date yymmdd10.;
cards;
20910	1992	9284	19920831
20910	1992	9362	19921030
20910	1988	9002	19880331
20910	1988	9037	19881031
20910	1988	9023	19880729
36020	2006	12342	20061130
36020	2003	12541	20030930
36020	2006	12336	20060630
36020	2003	13031	20030131
36020	2003	12813	20031031
;
run;


I want to select the most recent (and distinct) shrout for a given cusip and fyear. So the output will look like:

cusipfyearshrout
2091019889037
2091019929362
36020200312813
36020200612342

 

 

 

Respected Advisor
Posts: 3,799

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

I don't understand why you picked two obs for FY 2003 for cusip=36020
Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Posted in reply to data_null__

Sorry that was a typo, @data_null__. I will update the output table now. Thanks for pointing that out. 

Solution
‎06-01-2017 04:56 PM
Respected Advisor
Posts: 3,799

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

[ Edited ]

Maybe this is what you want.

 

data work.outstanding_shares_1;
infile cards expandtabs truncover;
input cusip	fyear shrout date : yymmdd10.;
format date yymmdd10.;
cards;
20910	1992	9284	19920831
20910	1992	9362	19921030
20910	1988	9002	19880331
20910	1988	9037	19881031
20910	1988	9023	19880729
36020	2006	12342	20061130
36020	2003	12541	20030930
36020	2006	12336	20060630
36020	2003	13031	20030131
36020	2003	12813	20031031
;
run;
proc print;
   run;
proc summary nway;
   class cusip fyear;
   output out=shrout(drop=_:)
      idgroup(max(date) out(shrout)=);
   run;
proc print;
   run;

2017-06-01_15-49-22.png 

Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Posted in reply to data_null__

Exactly! That is what I was looking for. Thanks!

@data_null__ (or even @art297) do you know whether one can do the same by just using SQL? I was thinking of the having and group clause but got stuck. I wanted to try whether one can actually do this by just using SQL. 

Contributor
Posts: 27

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Alternative SQL Method:

proc sql;
create table want as
	select distinct cusip, fyear, shrout 
	from work.outstanding_shares_1 
	where date in (select max(date) from work.outstanding_shares_1 group by cusip,fyear) ;
quit;
Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

@PBsas I just run your code, but the output does not look correct. I am having still duplicates for a given cusip & fyear. 

Do you think one can do it with the "having" clause? 

Contributor
Posts: 27

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Your sample data produced right output using this code. Please provide sample data that produced duplicates.

Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

You are right when I use your code on the inputted data, I am getting the correct result. However, when I use it on the actual data, I am getting a result as follows:

 

Screen Shot 2017-06-01 at 5.47.17 PM.png

Here is the log:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 61         
 62         proc sql;
 63         create table want as
 64         select distinct cusip, fyear, shrout
 65         from work.outstanding_shares_1
 66         where date in (select max(date) from work.outstanding_shares_1 group
 66       !  by cusip,fyear) ;
 NOTE: Table WORK.WANT created, with 168203 rows and 3 columns.
 
 67         quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.20 seconds
       cpu time            0.34 seconds
       
 
 68         
 69         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 82         



The only difference of the actual dataset and the sample I provided is the date format:

Screen Shot 2017-06-01 at 5.48.53 PM.png

Super User
Posts: 10,041

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

data work.outstanding_shares_1;
infile cards expandtabs truncover;
input cusip	fyear shrout date : yymmdd10.;
format date yymmdd10.;
cards;
20910	1992	9284	19920831
20910	1992	9362	19921030
20910	1988	9002	19880331
20910	1988	9037	19881031
20910	1988	9023	19880729
36020	2006	12342	20061130
36020	2003	12541	20030930
36020	2006	12336	20060630
36020	2003	13031	20030131
36020	2003	12813	20031031
;
run;

proc sql;
select cusip,fyear,shrout
 from outstanding_shares_1
  group by cusip,fyear
   having date=max(date);
quit;
Frequent Contributor
Posts: 110

Re: How to keep variables that are not in the SELECT DISTINCT statement in SAS SQL?

Thanks, @Ksharp. Your help is much appreciated! 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 408 views
  • 2 likes
  • 5 in conversation