Hello,
I have one question one Proc SQL, could please anyone advise on it?
My code is very simple as below:
proc sql ;
create table Engagement_rates ( compress=yes reuse=yes) as
select distinct ym_id,
count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED,
count(distinct case when selected = 'SELECTED' then USER_ID end)/
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;quit;
Here, the two green columns work well, but I always get 0 on the third column (Red Part).
So, could anyone please help me to check it?
Any advice will be appreciated!
Thanks,
Wayne
Is the source table an actual SAS dataset? Or are you using implicit or explicit passthrough to some remote database?
Such database have more variable types than SAS does including INTEGER data types. So it might be defaulting the variable to INTEGER and doing integer division between the two counts instead of floating point. Try converting the counts to floats. Or perhaps just include a floating point constant in the formula.
, 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
/ count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT format=5.3
proc sql ; create table Engagement_rates ( compress=yes reuse=yes) as select distinct ym_id, count(distinct case when selected = 'SELECTED' then USER_ID end) as selected, count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED, calculated selected / calculated presented as Engagement_RT format=8.4 from PERSONETICS_RPT5 as a group by ym_id order by ym_id ;quit;
Try the above instead. It may be a format issue, not showing the decimals.
You can also use the key word calculated to refer to a calculated value to simplify the code for legibility.
@qd96xuweifeng1 wrote:
Hello,
I have one question one Proc SQL, could please anyone advise on it?
My code is very simple as below:
proc sql ;
create table Engagement_rates ( compress=yes reuse=yes) as
select distinct ym_id,
count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED,
count(distinct case when selected = 'SELECTED' then USER_ID end)/
count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;quit;
Here, the two green columns work well, but I always get 0 on the third column (Red Part).
So, could anyone please help me to check it?
Any advice will be appreciated!
Thanks,
Wayne
Hi @Reeza ,
Thanks a lot for your quick response. After making your suggested change, the values are still 0 here. This is the first time I encounter this kind of situation, and the same logic works well before.
Thanks,
Wayne
How are you LOOKING at the dataset you created? What is it that you posted the photograph of? Some type of interactive browsing tool?
Try running PROC means on the variable to see the min, max, mean, etc.
The values should be less than 1 from the values you show in your photograph for the numerator and denominator. Perhaps you have formatted the number so it is displaying it as an integer? Try attaching a different format to the variable.
create table Engagement_rates as
select
ym_id
, count(distinct case when selected = 'SELECTED' then USER_ID end) as selected
, count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED
, count(distinct case when selected = 'SELECTED' then USER_ID end)
/ count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT format=5.3
from PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
;
Hi @Tom ,
Thanks a lot for your quick response. I view the output dataset in SAS EG, and just get screenshot on it. Even the format is changed to 8.4, the value is still 0.
Thanks,
Wayne
Is the source table an actual SAS dataset? Or are you using implicit or explicit passthrough to some remote database?
Such database have more variable types than SAS does including INTEGER data types. So it might be defaulting the variable to INTEGER and doing integer division between the two counts instead of floating point. Try converting the counts to floats. Or perhaps just include a floating point constant in the formula.
, 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
/ count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT format=5.3
Hi @Tom ,
Thanks a lot for your suggestion. I try to run your code on my source data which is located on ODBC database, the values are still 0. In order to make sure run the code on SAS dataset, I extract the dataset from database and generate a sas dataset in work lib. But when I run the code, I get error message below:
Thanks,
Weifeng
Looks like some of the character variables are defined with extreme lengths in your remote database. This is causing trouble for SAS when trying to process your query.
When you copy the data into SAS dataset make sure to use reasonable lengths for the character variables.
Note: Next time when posting SAS log do not post a photograph of the text, instead post the actual text. Just copy the text from the LOG and post it into the pop that appears when you click on the Insert Code icon.
Thanks @Tom ,
Please find the log error information below. One question here, since I have already extract a sub dataset which only include four columns. Meanwhile, all of these four columns are with limited length less than $30. Why the error is appeared here?
ERROR: Index TEMPINDX cannot be created on file WORK.'SASTMP-000000203'n because the length of the index value (20744 bytes) is too
large. The index page size (currently 32767) must be large enough to store three index values plus a small overhead
structure. Reduce the length of the index value or use the IBUFSIZE= option to increase the index page size (up to a maximum
of 32,767 bytes).ERROR: :Unable to create temporary index while processing summary functions.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
Thanks a lot for your time!
Wayne
Run PROC CONTENTS on the dataset you used as input to that query.
What are the TYPE and LENGTH of the variables you used in your query that generated those error messages?
You can ignore any FORMAT that might be attached to the variable. How much space it might use depends on the storage LENGTH, not how you have chosen to DISPLAY the values.
Thanks a lot @Tom ,
You are right, the USER_ID is very long. After making change on length, the correct ratio can be obtained. But there are two steps to get the correct values. The first one is extract data from database, and the second one is run the query to get ratio.
In order to get ratio from database directly, I try your code again. The below one still gives me all 0 on ratio.
, 1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end) / count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as Engagement_RT format=5.3
After adding another "1.00 * " on denominator as below, I get extreme large values on the ratio.
1.00 * count(distinct case when selected = 'SELECTED' then USER_ID end)
/ 1.00 * count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)
as Engagement_RT format=5.3
Thanks,
Wayne
1) Try running the query totally in the remote database using explicit pass thru and see if it does the same thing. If you are running in passthru mode you can use things like CAST() or other features of that databases SQL implementation to change the variable types.
2) Try using the CALCULATED keyword in your SAS only SQL and see if that solves the issue. It might make SAS only get the counts from the remote and do the actually division in SAS code.
3) Try changing the SQL to use a nested query. Get the counts and then do the division in the outer query. (I suspect that might not matter).
4) Try pushing just the counting into the remote and do the division on the SAS side. That will also reduce the amount data that needs to be moved from the remote database into SAS.
proc sql;
connect using MYLIB;
create table want as
select *, (numerator/denominator) as ratio
from connection to MYLIB
(select groupvar,count(....) as numerator, count(....) as denominator
from myschema.mytable
group by groupvar
)
;
run;
Hi @Tom ,
After running the query totally in the remote database using explicit pass thru, I can get the correct ratio. I think my problem is solved. Here, I still have one minor problem as below:
If I just run the code below for ratio only, I can get correct result without any error.
proc sql noprint;
connect to sybaseiq(host='pmba-appdb.bmogc.net' server='PROD_MBA' database='prod_mba' &password.);
create table Engagement_rates ( compress=yes reuse=yes) as select * from connection to sybaseiq
(select ym_id,
1.00*count(distinct case when selected = 'SELECTED' then USER_ID end)
/count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT
from sa_dcgroup.PERSONETICS_RPT5 as a
group by ym_id
order by ym_id
);
quit;
But if I include more lines to get volume information, some error shows as below:
26 proc sql noprint;
27 connect to sybaseiq(host='pmba-appdb.bmogc.net' server='PROD_MBA' database='prod_mba' &password.);
28 create table Engagement_rates ( compress=yes reuse=yes) as select * from connection to sybaseiq
29 (select ym_id,
30 1.00*count(distinct case when selected = 'SELECTED' then USER_ID end)
31 /count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT,
32 count(distinct case when selected = 'SELECTED' then USER_ID end) as selected,
33 count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED
34 from sa_dcgroup.PERSONETICS_RPT5 as a
35 group by ym_id
36 order by ym_id);
ERROR: CLI prepare error: [Sybase][ODBC Driver][SQL Anywhere]Invalid expression near 'count'
SQL statement: select ym_id, 1.00*count(distinct case when selected = 'SELECTED' then USER_ID end) /count(distinct case when
PRESENTED = 'PRESENTED' then USER_ID end)as Engagement_RT, count(distinct case when selected = 'SELECTED' then USER_ID end)
as selected, count(distinct case when PRESENTED = 'PRESENTED' then USER_ID end) as PRESENTED from
sa_dcgroup.PERSONETICS_RPT5 as a group by ym_id order by ym_id.
May I know if there is any syntax error in my code?
Thanks a lot!
Wayne
Not sure. The only thing I notice is there is no spaces between closing right parenthesis and the AS keyword at one point, but that shouldn't make a difference. You might want to ask about it on a forum for that database. Unfortunately each SQL implementation has its own set of enhancements and quirks.
Just do the division on the SAS side.
Thanks a lot @Tom , look like I cannot apply "count" on single field for more than one times. I will ask this in database forum.
My problem is resolved. Thanks a lot and have a great weekend!
Wayne
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.