BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
qd96xuweifeng1
Calcite | Level 5

Hello,

I have one question one Proc SQL, could please anyone advise on it?

My code is very simple as below:

 

proc sql ;
create table Engagement_rates ( compress=yes reuse=yes) as
select distinct ym_id,
count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED,
count(distinct case when selected = 'SELECTED' then USER_ID end)/
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT 
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;quit;

 

Here, the two green columns work well, but I always get 0 on the third column (Red Part).

qd96xuweifeng1_0-1652371427254.png

So, could anyone please help me to check it?

Any advice will be appreciated!

Thanks,

Wayne

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is the source table an actual SAS dataset? Or are you using implicit or explicit passthrough to some remote database?

Such database have more variable types than SAS does including INTEGER data types.  So it might be defaulting the variable to INTEGER and doing integer division between the two counts instead of floating point.  Try converting the counts to floats. Or perhaps just include a floating point constant in the formula.

  , 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
  / count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
    as Engagement_RT format=5.3 

View solution in original post

20 REPLIES 20
Reeza
Super User

 

 

proc sql ;
create table Engagement_rates ( compress=yes reuse=yes) as
select distinct ym_id,
count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED,
calculated selected / calculated presented as Engagement_RT  format=8.4
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;quit;

Try the above instead. It may be a format issue, not showing the decimals. 

You can also use the key word calculated to refer to a calculated value to simplify the code for legibility. 

 


@qd96xuweifeng1 wrote:

Hello,

I have one question one Proc SQL, could please anyone advise on it?

My code is very simple as below:

 

proc sql ;
create table Engagement_rates ( compress=yes reuse=yes) as
select distinct ym_id,
count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED,
count(distinct case when selected = 'SELECTED' then USER_ID end)/
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT 
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;quit;

 

Here, the two green columns work well, but I always get 0 on the third column (Red Part).

qd96xuweifeng1_0-1652371427254.png

So, could anyone please help me to check it?

Any advice will be appreciated!

Thanks,

Wayne


 

qd96xuweifeng1
Calcite | Level 5

Hi @Reeza ,

 

Thanks a lot for your quick response. After making your suggested change, the values are still 0 here. This is the first time I encounter this kind of situation, and the same logic works well before.

 

Thanks,

Wayne

qd96xuweifeng1_0-1652375706058.png

 

Tom
Super User Tom
Super User

How are you LOOKING at the dataset you created?  What is it that you posted the photograph of? Some type of interactive browsing tool?

 

Try running PROC means on the variable to see the min, max, mean, etc.

The values should be less than 1 from the values you show in your photograph for the numerator and denominator.  Perhaps you  have formatted the number so it is displaying it as an integer?  Try attaching a different format to the variable.

 

create table Engagement_rates as
select 
    ym_id
  , count(distinct case when selected = 'SELECTED' then USER_ID end) as selected
  , count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED
  , count(distinct case when selected = 'SELECTED' then USER_ID end)
  / count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
    as Engagement_RT format=5.3 
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;
qd96xuweifeng1
Calcite | Level 5

Hi @Tom ,

Thanks a lot for your quick response. I view the output dataset in SAS EG, and just get screenshot on it. Even the format is changed to 8.4, the value is still 0.

 

Thanks,

Wayne

qd96xuweifeng1_0-1652375947537.png

 

Tom
Super User Tom
Super User

Is the source table an actual SAS dataset? Or are you using implicit or explicit passthrough to some remote database?

Such database have more variable types than SAS does including INTEGER data types.  So it might be defaulting the variable to INTEGER and doing integer division between the two counts instead of floating point.  Try converting the counts to floats. Or perhaps just include a floating point constant in the formula.

  , 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
  / count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
    as Engagement_RT format=5.3 
qd96xuweifeng1
Calcite | Level 5

Hi @Tom ,

Thanks a lot for your suggestion. I try to run your code on my source data which is located on ODBC database, the values are still 0. In order to make sure run the code on SAS dataset, I extract the dataset from database and generate a sas dataset in work lib. But when I run the code, I get error message below:

qd96xuweifeng1_0-1652379278623.png

Thanks,

Weifeng

 

Tom
Super User Tom
Super User

Looks like some of the character variables are defined with extreme lengths in your remote database.  This is causing trouble for SAS when trying to process your query. 

When you copy the data into SAS dataset make sure to use reasonable lengths for the character variables.

 

Note: Next time when posting SAS log do not post a photograph of the text, instead post the actual text.  Just copy the text from the LOG and post it into the pop that appears when you click on the Insert Code icon.

Tom_0-1652380385172.png

 

qd96xuweifeng1
Calcite | Level 5

Thanks @Tom ,

Please find the log error information below. One question here, since I have already extract a sub dataset which only include four columns. Meanwhile, all of these four columns are with limited length less than $30. Why the error is appeared here?


ERROR: Index TEMPINDX cannot be created on file WORK.'SASTMP-000000203'n because the length of the index value (20744 bytes) is too
large. The index page size (currently 32767) must be large enough to store three index values plus a small overhead
structure. Reduce the length of the index value or use the IBUFSIZE= option to increase the index page size (up to a maximum
of 32,767 bytes).ERROR: :Unable to create temporary index while processing summary functions.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

 

Thanks a lot for your time!

Wayne

Tom
Super User Tom
Super User

Run PROC CONTENTS on the dataset you used as input to that query.

What are the TYPE and LENGTH of the variables you used in your query that generated those error messages?

 

You can ignore any FORMAT that might be attached to the variable.  How much space it might use depends on the storage LENGTH, not how you have chosen to DISPLAY the values.

qd96xuweifeng1
Calcite | Level 5

Thanks a lot @Tom  ,

You are right, the USER_ID is very long. After making change on length, the correct ratio can be obtained. But there are two steps to get the correct values. The first one is extract data from database, and the second one is run the query to get ratio. 

qd96xuweifeng1_0-1652386635269.png

In order to get ratio from database directly, I try your code again. The below one still gives me all 0 on ratio.

, 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end) / count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as Engagement_RT format=5.3

After adding another "1.00 * " on denominator as below, I get extreme large values on the ratio.

1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
/ 1.00 * count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT format=5.3

qd96xuweifeng1_1-1652387579200.png

Thanks,

Wayne

 

 

Tom
Super User Tom
Super User

1) Try running the query totally in the remote database using explicit pass thru and see if it does the same thing.  If you are running in passthru mode you can use things like CAST() or other features of that databases SQL implementation to change the variable types.

2) Try using the CALCULATED keyword in your SAS only SQL and see if that solves the issue.  It might make SAS only get the counts from the remote and do the actually division in SAS code.

3) Try changing the SQL to use a nested query.  Get the counts and then do the division in the outer query.  (I suspect that might not matter).

4) Try pushing just the counting into the remote and do the division on the SAS side.  That will also reduce the amount data that needs to be moved from the remote database into SAS.

proc sql;
connect using MYLIB;
create table want as 
select *, (numerator/denominator) as ratio
from connection to MYLIB
  (select groupvar,count(....) as numerator, count(....) as denominator
   from myschema.mytable
   group by groupvar
  )
;
run;
qd96xuweifeng1
Calcite | Level 5

Hi @Tom ,

After running the query totally in the remote database using explicit pass thru, I can get the correct ratio. I think my problem is solved. Here, I still have one minor problem as below:

 

If I just run the code below for ratio only, I can get correct result without any error.

proc sql noprint;
connect to sybaseiq(host='pmba-appdb.bmogc.net' server='PROD_MBA' database='prod_mba' &password.);
create table Engagement_rates ( compress=yes reuse=yes) as select * from connection to sybaseiq
(select	ym_id,
		1.00*count(distinct case when selected = 'SELECTED' then USER_ID end)
		/count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT	
     from sa_dcgroup.PERSONETICS_RPT5 as a 
	 group by ym_id
	 order by ym_id
);
quit;

But if I include more lines to get volume information, some error shows as below:

26         proc sql noprint;
27         connect to sybaseiq(host='pmba-appdb.bmogc.net' server='PROD_MBA' database='prod_mba' &password.);
28         create table Engagement_rates ( compress=yes reuse=yes) as select * from connection to sybaseiq
29         (select	ym_id,
30         		1.00*count(distinct case when selected = 'SELECTED' then USER_ID end)
31         		/count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT,
32         		count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
33         		count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED
34              from sa_dcgroup.PERSONETICS_RPT5 as a
35         	 group by ym_id
36         	 order by ym_id);
ERROR: CLI prepare error: [Sybase][ODBC Driver][SQL Anywhere]Invalid expression near 'count'
SQL statement: select ym_id, 1.00*count(distinct case when selected = 'SELECTED' then USER_ID end) /count(distinct case when 
       PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT, count(distinct case when selected = 'SELECTED' then USER_ID end) 
       as selected, count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED from 
       sa_dcgroup.PERSONETICS_RPT5 as a group by ym_id order by ym_id.

May I know if there is any syntax error in my code? 

 

Thanks a lot!

Wayne

Tom
Super User Tom
Super User

Not sure.  The only thing I notice is there is no spaces between closing right parenthesis and the AS keyword at one point, but that shouldn't make a difference.  You might want to ask about it on a forum for that database.  Unfortunately each SQL implementation has its own set of enhancements and quirks.

 

Just do the division on the SAS side.

 

qd96xuweifeng1
Calcite | Level 5

Thanks a lot @Tom , look like I cannot apply "count" on single field for more than one times. I will ask this in database forum. 

My problem is resolved. Thanks a lot and have a great weekend!

Wayne

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1946 views
  • 4 likes
  • 3 in conversation