BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
braam
Quartz | Level 8

I just realized that proc expand works in a different way than I expected it would. When I take a lag of the variable, it is actually not the same as the "lag" of the variable. A snippet of my code is as below.

 


proc expand data= temp
	out=temp2
/*	method= none*/
	;
	convert Var= LagVar / transformout= (lag 1);
	run;

data temp3;
	set temp2;
	if Var= LagVar then Same= 1; else Same= 0;
	run;

proc print data= temp3;
	format lagVar best20.10;
	var year lagVar Var Same; run;

The print statement gives me:

Obs year LagVar var Same
1 2000 . 1 0
2 2001 1 1 1
3 2002 1 1 1
4 2003 1 1 1
5 2004 1 1 1
6 2005 1 0 0
7 2006 -2.2204460492503E-16 1 0
8 2007 1 1 0
9 2008 1 1 1
10 2009 1 1 1

LIke shown in the table, one value looks weird though it is very close to 0 (i.e., the lag value). 

Coincidently, I found that adding method=none option seems to solve this, but I'm not entirely sure. Is it a correct solution to this issue? Where does the issue come from? I feel like proc expand is an excellent thing to use, but I am not so confident with its functionality. Many thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
dw_sas
SAS Employee

Hi @braam , 

 

By default, PROC EXPAND fits a cubic spline function through the nonmissing values of your analysis variable prior to performing any transformations specified in the TRANSFORM= option.  This can result in slight precision differences from your original data for some observations, as @Reeza noted.  If you do not need to do any missing value interpolation or frequency conversion in PROC EXPAND prior to computing your transformations, then you are correct to specify the METHOD=NONE option.  For more details on the METHOD=NONE option, please see the following documentation link:

 

https://go.documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_details13.htm&docsetVersi...   

 

Another alternative is to use the ROUND function to round the results from PROC EXPAND prior to making your comparisons.  For example, you could add the statement:   lagvar=round(lagvar,1e-8);  in your TEMP3 data step prior to your IF statement.

 

I hope this helps!

DW

View solution in original post

8 REPLIES 8
Reeza
Super User
Well yes, numerical precision is always an issue with numbers but I don't see any particular issue with your code/step.
The comparisons happened correctly and if you use a reasonable format it works fine. Try 20.2 and see what happens.

Depending on what you're doing you can use LAG() within the data step which saves you a step.

But numerical precision is an issue in any computer application including Excel, R and Python.
braam
Quartz | Level 8
Thanks for your reply. I see that the issue happens in year 2007. I expected Same to be 1 for year 2007, but it's 0. Surprisingly, I noticed this (seemingly) randomly in my dataset. Then I feel like method= none is a solution, but not sure.
braam
Quartz | Level 8
Yes, I know I can use data step with lag(). But I found proc expand better because I can use "lead" easily.
Reeza
Super User
General rule is if you need numeric precision to that degree in your data you would need to use ROUND() in the comparison.

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docsetVe...
This page goes in depth into the issues and resolutions.
dw_sas
SAS Employee

Hi @braam , 

 

By default, PROC EXPAND fits a cubic spline function through the nonmissing values of your analysis variable prior to performing any transformations specified in the TRANSFORM= option.  This can result in slight precision differences from your original data for some observations, as @Reeza noted.  If you do not need to do any missing value interpolation or frequency conversion in PROC EXPAND prior to computing your transformations, then you are correct to specify the METHOD=NONE option.  For more details on the METHOD=NONE option, please see the following documentation link:

 

https://go.documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_details13.htm&docsetVersi...   

 

Another alternative is to use the ROUND function to round the results from PROC EXPAND prior to making your comparisons.  For example, you could add the statement:   lagvar=round(lagvar,1e-8);  in your TEMP3 data step prior to your IF statement.

 

I hope this helps!

DW

braam
Quartz | Level 8
Thanks for your answer @dw_sas. I'm happy that I could just use METHOD=NONE to solve this issue quickly.
I would use ROUND if there was no solution for this. But obviously, I need to add a couple of extra lines of coding just for this, which I don't like! Many thanks again.
mkeintz
PROC Star

@dw_sas 

 

Thank you for clarifying one of the METHOD=NONE consequences, which I had realize after looking at the documentation when I experienced proc expand results similar to @braam.

 

The linked documentation says

When METHOD=NONE is specified, there is no difference between the TRANSFORMIN= and TRANSFORMOUT= options;

To me this language implies, when method is NOT equal to none, there is (could be) a difference between transformin and transformout, even when the same transformation is specified.

 

Should I interpret this to mean that specifying a transformation in the transformin parameter would circumvent (or precede) the cubic spline interpolation, but moving the same specification to transformout would not circumvent it (thereby creating the possibility of small differences in output value)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
dw_sas
SAS Employee

Hi @mkeintz ,

 

I can see how the sentence you noted in the documentation might be confusing.  Let me see if I can clarify it...

 

For this particular case, where there was:

  • no frequency conversion,
  • no missing value interpolation,
  • the transformation was a simple lag (or lead) transformation, and
  • the METHOD=NONE option was not specified,

the final results should be the same regardless of whether you use the TRANSFORMOUT= option or the TRANSFORMIN= option.   This is because the cubic spline function is fit through the exact same data values in both cases.

 

For other cases, such as when frequency conversion is being done or when missing values are present in the data, you would expect to see differences if you use the TRANSFORMIN= option rather than the TRANSFORMOUT= option.

 

If no frequency conversion or missing value interpolation is required, and all you want to do is perform the requested transformation on the values in the DATA= data set, then it is best to use the METHOD=NONE option in this case.

 

I hope this helps!

DW

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1214 views
  • 3 likes
  • 4 in conversation