BookmarkSubscribeRSS Feed
danielchoi626
Calcite | Level 5

I am currently in the process of converting a Python script to SAS / PROC SQL and whilst everything has been going smoothly I've been stumped by an odd case where calculating Z-Scores with the following scripts give me different answers.

Python:
HO_HOME_PROVINCE_MEAN = df_A10[["HOME_PROVINCE", "EUCLIDEAN_HOME_OFFICE_DIST_M"]].groupby("HOME_PROVINCE").mean().rename(columns = {"EUCLIDEAN_HOME_OFFICE_DIST_M" : "HO_HOME_PROVINCE_MEAN"}).reset_index()
HO_HOME_PROVINCE_STD = df_A10[["HOME_PROVINCE", "EUCLIDEAN_HOME_OFFICE_DIST_M"]].groupby("HOME_PROVINCE").std().rename(columns = {"EUCLIDEAN_HOME_OFFICE_DIST_M" : "HO_HOME_PROVINCE_STD"}).reset_index()

df_A11 = pd.merge(df_A10, HO_HOME_PROVINCE_MEAN, on = "HOME_PROVINCE", how = "left")
df_A11 = pd.merge(df_A10, HO_HOME_PROVINCE_STD, on = "HOME_PROVINCE", how = "left")

df11["HO_HOME_DISTRICT_ZSCORE"] = (df_A11["EUCLIDEAN_HOME_OFFICE_DIST_M"] - df_A11["HO_HOME_DISTRICT_MEAN"]) / df_A11["HO_HOME_DISTRICT_STD"]
SAS:

PROC SQL; CREATE TABLE DEVDAT11 AS
SELECT * FROM DEVDAT10 AS A
LEFT JOIN HOME_PROVINCE_MEAN AS B
ON A.HOME_PROVINCE = B.HOME_PROVINCE;

PROC SQL; CREATE TABLE DEVDAT12 AS
SELECT * FROM DEVDAT11 AS A
LEFT JOIN HOME_PROVINCE_STD AS B
ON A.HOME_PROVINCE = B.HOME_PROVINCE;

DATA WORK.DEVDAT13;
SET DEVDAT12;
HO_HOME_PROVINCE_ZSCORE = (EUCLIDEAN_HOME_OFFICE_DIST_M - HO_HOME_PROVINCE_MEAN) / HO_HOME_PROVINCE_STD;
RUN;

I've made sure to check if I've made any miscalculations using .sum() for Python and Proc Means Sum for SAS and so far it doesn't seem like I've made any mistakes on that front. I suspect that it might have something to do with the way Python handles null values or zeroes in rows as opposed to SAS as I keep getting this message from the log:

 

NOTE: Division by zero detected at line 2202 column 90.

The file that I've attached to this post contains a zero and a null value in column "HO_HOME_PROVINCE_STD". Any advice would be very much appreciated

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Have you actually looked at your datasets with your own eyes to identify where the differences are, and what might be causing this? This is investigative work that you can (and should) do, and will most likely resolve the problem. If it doesn't, then show us the very specific observations in the data that are not giving the same answers, for SAS variables named EUCLIDEAN_HOME_OFFICE_DIST_M, HO_HOME_PROVINCE_MEAN and HO_HOME_PROVINCE_STD.

--
Paige Miller
andreas_lds
Jade | Level 19

I don't know how Python handles division by zero, but returning a missing value, as sas does, is correct.

Reeza
Super User
In neither of these tools should you be doing z scores manually. PROC STDIZE in SAS will ensure that your calculations are correct.
PaigeMiller
Diamond | Level 26

But the Z-score code shown in a DATA step ought to produce the exact same results as PROC STDIZE — assuming the mean and standard deviation were calculated correctly, which is something the original poster ought to have checked already.

--
Paige Miller
Reeza
Super User
Yes, but doing things manually and not using a proc leaves you more open to mistakes as shown exactly in this post by someone trying to do it manually in different applications.

And as OP has stated, they're not super familiar with SAS so knowing that PROC STDIZE can do it more efficiently is worth mentioning.
PaigeMiller
Diamond | Level 26

And I completely agree with you that people should use PROC STDIZE to calculate z-scores (and similar quantities).

 

I was trying to point out that the line of code which computes the z-scores in the original message could not be the cause of the problems that the original poster is experiencing. The error must be something else.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1126 views
  • 7 likes
  • 4 in conversation