BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jozuleta
Obsidian | Level 7

Hi experts,

my data sample (shortcut) looks like this:

 StockCountryAnalystBrokerDateTP2P

1

BCECAMarcRBC06/30/20150.24

2

BCECAMarcRBC01/17/20170.32

3

ATCOCAMarcRBC05/23/2017-0.15

4

InditexESJorgeCAJA01/16/20180.03

5

IBMUSJonBNY05/30/20180.09

6

IBMUSJonBNY10/25/20160.01

7

APPLEUSJuliaBNY07/21/20170.09

8

APPLEUSSarahUBS07/30/20180.10

 

I am trying to build a table that provides the number of STOCKS, ANALYSTS and BROKER,
and the means of key variables by country.
The resulting table should be like this:

Panel A:     
Country# Stocks# Analysts# Broker# Obs.Mean (TP2P)
CA21130.14
ES11110.03
US23240.07
TOTAL5548 
MEAN1.661.661.332.660.08

 

and another table (Panel B) with the following result:

Panel B:  
Year# Obs.#Analysts
201511
201611
201732
201833
MEAN21.75

 

Can anyone help?

 

1 ACCEPTED SOLUTION

Accepted Solutions
SuryaKiran
Meteorite | Level 14

Check this:

data have;
infile datalines dlm=',';
input Stock :$8. Country :$8. Analyst :$8. Broker :$8. Date :$8.TP2P;
datalines;
BCE,CA,Marc,RBC,06/30/2015,0.24
BCE,CA,Marc,RBC,01/17/2017,0.32
ATCO,CA,Marc,RBC,05/23/2017,-0.15
Inditex,ES,Jorge,CAJA,01/16/2018,0.03
IBM,US,Jon,BNY,05/30/2018,0.09
IBM,US,Jon,BNY,10/25/2016,0.01
APPLE,US,Julia,BNY,07/21/2017,0.09
APPLE,US,Sarah,UBS,07/30/2018,0.10
;
run;


proc sql;
create table pre_want as
select 	monotonic() as row_no,
		Country, 
		count(distinct stock) format=4.2 as Stock, 
		count(distinct Analyst) format=4.2 as Analyst,
		count(distinct Broker) format=4.2 as Broker,
		count(*) format=4.2 as Obs,
		mean(TP2P) format=4.2 as mean_TP2P
	from have
group by Country
;
quit;

proc sql;
create table want(drop=row_no) as 
select * from pre_want
union
select monotonic()+10 as row_number,
		'TOTAL' as Country,
		sum(Stock) format=4.2 as Stock,
		sum(Analyst) format=4.2 as Analyst,
		sum(Broker) format=4.2 as Broker,
		sum(Obs) format=4.2as Obs,
		. format=4.2 as mean_TP2P
	from pre_want
union
select monotonic()+100 as row_number,
		'Mean' as Country,
		mean(Stock) format=4.2 as Stock,
		mean(Analyst) format=4.2 as Analyst,
		mean(Broker) format=4.2 as Broker,
		mean(Obs) format=4.2 as Obs,
		mean(mean_TP2P) format=4.2 as mean_TP2P
	from pre_want
order by 1
;
quit;
Thanks,
Suryakiran

View solution in original post

20 REPLIES 20
Reeza
Super User
Are you familiar with SQL? That’s the fastest option in this case, due to the multiple distinct counts you’d like.
jozuleta
Obsidian | Level 7

@Reezaa little bit. If you have a solution using SQL I would appreciate!

 

SuryaKiran
Meteorite | Level 14

Check this:

data have;
infile datalines dlm=',';
input Stock :$8. Country :$8. Analyst :$8. Broker :$8. Date :$8.TP2P;
datalines;
BCE,CA,Marc,RBC,06/30/2015,0.24
BCE,CA,Marc,RBC,01/17/2017,0.32
ATCO,CA,Marc,RBC,05/23/2017,-0.15
Inditex,ES,Jorge,CAJA,01/16/2018,0.03
IBM,US,Jon,BNY,05/30/2018,0.09
IBM,US,Jon,BNY,10/25/2016,0.01
APPLE,US,Julia,BNY,07/21/2017,0.09
APPLE,US,Sarah,UBS,07/30/2018,0.10
;
run;


proc sql;
create table pre_want as
select 	monotonic() as row_no,
		Country, 
		count(distinct stock) format=4.2 as Stock, 
		count(distinct Analyst) format=4.2 as Analyst,
		count(distinct Broker) format=4.2 as Broker,
		count(*) format=4.2 as Obs,
		mean(TP2P) format=4.2 as mean_TP2P
	from have
group by Country
;
quit;

proc sql;
create table want(drop=row_no) as 
select * from pre_want
union
select monotonic()+10 as row_number,
		'TOTAL' as Country,
		sum(Stock) format=4.2 as Stock,
		sum(Analyst) format=4.2 as Analyst,
		sum(Broker) format=4.2 as Broker,
		sum(Obs) format=4.2as Obs,
		. format=4.2 as mean_TP2P
	from pre_want
union
select monotonic()+100 as row_number,
		'Mean' as Country,
		mean(Stock) format=4.2 as Stock,
		mean(Analyst) format=4.2 as Analyst,
		mean(Broker) format=4.2 as Broker,
		mean(Obs) format=4.2 as Obs,
		mean(mean_TP2P) format=4.2 as mean_TP2P
	from pre_want
order by 1
;
quit;
Thanks,
Suryakiran
Reeza
Super User
Are the data for each state mutually exclusive? Can you assume that you can add up the # of analysts or would that need to be separate count distinct()?

jozuleta
Obsidian | Level 7
count distinct. The code from SuryaKiran works good. Thanks to you, too Reeza!
Reeza
Super User
The solution does not do a count distinct for totals. So if you have overlaps, this will not be correct.
jozuleta
Obsidian | Level 7
Do you mind to reply your solution to see better the differences in the codes / data ? And, What about Panel B (by years). Any help you could provide? Thanks
Reeza
Super User
You should be able to adapt the code already given. For years, change the GROUP to include YEAR not STATE. For totals hit the main table again but consider what would go into the GROUP statement. You already have all the pieces, just need to reorder them a bit.
Also, see what happens if you have more than 10 states or years.

If you're having issues post the code you've modified and we can let you know what you need to do.
jozuleta
Obsidian | Level 7

Hi Reeza, 

I thought it would be that easy as you mentioned but unfortunately it doesn't work 😞
Here the code:

proc sql;
create table MyData.PRE_Table1 as
select 	monotonic() as row_no,
		year format=BESTw. length=8, 
		count(distinct RIC) format=8.0 as RIC, 
		count(distinct ANCODE) format=8.0 as ANCODE,
		count(distinct BROKERNAM) format=8.0 as BROKERNAM,
		count(*) format=8.0 as Obs,
		mean(TP2P) format=8.2 as mean_TP2P
	from MyData.SAMP_ESTIMATOR
group by year 
;
quit;
proc sql;
create table MyData.Tabelle_1(drop=row_no) as 
select * from MyData.PRE_Table1
union
select monotonic()+10 as row_number,
		'MEAN' as year,
		sum(RIC) format=8.0 as RIC,
		sum(ANCODE) format=8.0 as ANCODE,
		sum(BROKERNAM) format=8.0 as BROKERNAM,
		sum(Obs) format=8.0 as Obs,
		. format=8.2 as mean_TP2P
	from MyData.PRE_Table1
union
select monotonic()+100 as row_number,
		'MEAN' as year,
		mean(RIC) format=8.0 as RIC,
		mean(ANCODE) format=8.0 as ANCODE,
		mean(BROKERNAM) format=8.0 as BROKERNAM,
		mean(Obs) format=8.0 as Obs,
		mean(mean_TP2P) format=8.2 as mean_TP2P
	from MyData.PRE_Table1
order by 1
;
quit;

The first table is created without any problems. But then in the second proc I get this error:

ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.
ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.
 
I do not see exactly why it should not be the same type ?! 

Could you provide some help again? 🙂
 
Here is a screenshot of the Table (PRE_Table1) having also the information about the variable "year"
Unbenannt.PNG
Reeza
Super User
Year is numeric in your first data set and Total is clearly not numeric, it's a character variable.

You can change Year to be a character value (Put(year, 4.)) or you can Mean/Total to be a number and use a format to display it as mean/total. Changing Year to be a character is easier. But...if you ever need to sort this for some reason, then the number method is easier though you do need to have a format. I usually just set the number to something arbitrarily large (100000 = Mean, 100001 = Total).
jozuleta
Obsidian | Level 7

Sorry, I guess I do not know exactly how to change it in my code.

I changed it like this:

 

proc sql;
create table JORGE.PRE_Table1 as
select 	monotonic() as row_no,
		year (Put(year, 4.)), 
		count(distinct RIC) format=8.0 as RIC, 
		count(distinct ANCODE) format=8.0 as ANCODE,
		count(distinct BROKERNAM) format=8.0 as BROKERNAM,
		count(*) format=8.0 as Obs,
		mean(TP2P) format=8.2 as mean_TP2P
	from JORGE.SAMP_ESTIMATOR
group by year
;
quit;
proc sql;
create table JORGE.Tabelle_1(drop=row_no) as 
select * from JORGE.PRE_Table1
union
select monotonic()+10 as row_number,
		'TOTAL' as year,
		sum(RIC) format=8.0 as RIC,
		sum(ANCODE) format=8.0 as ANCODE,
		sum(BROKERNAM) format=8.0 as BROKERNAM,
		sum(Obs) format=8.0 as Obs,
		. format=8.2 as mean_TP2P
	from JORGE.PRE_Table1
union
select monotonic()+100 as row_number,
		'MEAN' as year,
		mean(RIC) format=8.0 as RIC,
		mean(ANCODE) format=8.0 as ANCODE,
		mean(BROKERNAM) format=8.0 as BROKERNAM,
		mean(Obs) format=8.0 as Obs,
		mean(mean_TP2P) format=8.2 as mean_TP2P
	from JORGE.PRE_Table1
order by 1
;
quit;

 ...and I get now the following error:

 ERROR: Function YEAR requires a numeric expression as argument 1.
 
Furthermore, I checked the means between the different tables (Reference table vs table ordered by country vs table ordered by year) for the TP2P variable.They are all different...which means that there is something wrong with the first code.
- Reference table (SAMP_ESTIMATOR): mean of 0.2191538 (using proc means)
- Table ordered by country (Pre_Table1): 0.1918582 (using proc means)
- Table ordered by year (Pre_Table2): 0.2271409 (using proc means)
 
The means should be all the same as the mean of the Reference table: 0.2191538
Conclusion: There is something wrong with the whole code 😞

 
I will provide you with the two tables, maybe it helps to solve the problem easier (the reference table is too large to upload).
- Pre_Table1 is ordered by country
- Pre_Table2 is ordered by year
 
Thanks!
 
 
 
SuryaKiran
Meteorite | Level 14

YEAR() function argument to be numeric.  Remove put() function inside year()

year (Put(year, 4.)) ---> year(year)

 

Perform the calculations on the right grouped columns 

Thanks,
Suryakiran
jozuleta
Obsidian | Level 7

Hi SuryaKiran,

 

Unfortunately it doesn't work. Now I have a new column called "_TEMA007" instead of "year". Furthermore, the years are all set now to "1995".

my code:

 

proc sql;
create table JORGE.PRE_Table2 as
select 	monotonic() as row_no,
		year(year), 
		count(distinct RIC) format=8.0 as RIC, 
		count(distinct ANCODE) format=8.0 as ANCODE,
		count(distinct BROKERNAM) format=8.0 as BROKERNAM,
		count(distinct COMPLOC) fortmat=8.0 as COMPLOC,
		count(*) format=8.0 as Obs,
		mean(TP2P) format=8.2 as mean_TP2P
	from JORGE.SAMP_ESTIMATOR
group by year
;
quit;

Even If I a run the whole code  the problems is still there:

 

ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.
ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.
 
Do you have an idea why the means are different from my reference table ? They shouldn't.
The Total of Observations are the same but apparently the means are not?! (even when changing the format of the numers to eg. 8.8)
SuryaKiran
Meteorite | Level 14

Do you have year column in JORGE.SAMP_ESTIMATOR dataset as numeric? then use put(year,4.) as year

 

When your appending two tables using UNION then you need to have same data types for same columns. In your case my guess is your first query has year numeric and the next queries are character. 

Thanks,
Suryakiran

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 2360 views
  • 3 likes
  • 3 in conversation