turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Format issue, SAS giving difference of 2 identical...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-27-2012 12:27 PM

Hi All,

I am trying to cut off a set of observations, depending on whether

they are LE median value or GT median value..

I have rounded my incoming numbers to 2 decimal places using round

(var,0.01) and the median value is calculated from this dataset..But when i take this median from proc univariate and pass it to the

dataset as a macro var and try to flag them appropriately, the record

with exact same value as the median is going into the GTMedian

bucket..

As an alternative I tried to round the median off to 1 decimal before feeding to macro var, it did not work either..

So I tried to take the difference between each observation and the median..and the diff for the record in question was -2.22045E-16..

I tried many methods but the issue still exists...

Is this some format/informat issue?

Thank you,

raisins25

Accepted Solutions

Solution

06-27-2012
08:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 08:48 PM

Still feels like a numeric precision problem to me. While numeric vars with length <8 can make the problem more likely, it can still happen when length=8. Below is an example showing that the mean of (.1,.1,.1) is slighly greater than .1. This feels like the same problem you are running into. The link to Rick Wicklin's blog has some suggestions for introducing a fuzz factor if you need to do these sorts of comparisons.

928 data a; 929 x=.1; 930 output; 931 output; 932 output; 933 run; NOTE: The data set WORK.A has 3 observations and 1 variables. 934 935 proc means data=a mean noprint ; 936 var x; 937 output out=b mean=mean; 938 run; NOTE: There were 3 observations read from the data set WORK.A. NOTE: The data set WORK.B has 1 observations and 3 variables. 939 940 data c; 941 if _n_=1 then set b (keep=mean); 942 set a (keep=x); 943 if x<mean then type='low '; 944 else if x>mean then type='high'; 945 else if x=mean then type='mean'; 946 dif=x-mean; 947 put (x mean type dif) (=); 948 run; x=0.1 mean=0.1 type=low dif=-1.38778E-17 x=0.1 mean=0.1 type=low dif=-1.38778E-17 x=0.1 mean=0.1 type=low dif=-1.38778E-17 NOTE: There were 1 observations read from the data set WORK.B. NOTE: There were 3 observations read from the data set WORK.A. NOTE: The data set WORK.C has 3 observations and 4 variables.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 01:23 PM

How are you creating the macro variable? Since macro variables are basically text there are some othe issues to consider.

Also how are flagging them? Some code may provide hints as to what correction you need.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 01:42 PM

Yes, a small amout of sample data would help people help you.

In addition to macro vars being text, there often can be these sort of precision issues even in data set variables, due to numeric precision issues (computers can't represent all non-integers exactly, so you can end up with something that looks odd, like below)

96 data _null_; 97 if .1+.1+.1=.3 then put "decimal math works."; 98 else put "uh-oh, numeric precision problem!!"; 99 run; uh-oh, numeric precision problem!!

Fore more background, see e.g. :

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 03:01 PM

Thanks for your input...

Some additional info:

data vol;

set crf.stxvol;

where visit in (2,5);

format _all_;

informat _all_;

tot = round(sum(STIJVL3,STIJVL4),0.01);

if tot gt 0;

keep pt cpevent tot ;

run;

proc sort data = vol; by pt cpevent;run;

data vol;

set vol;

by pt cpevent;

retain l;

if first.pt then l = tot;

if first.pt eq last.pt then fin =tot;

else if last.pt then fin = tot + l;

if last.pt;

keep pt fin;

run;

proc sort data = vol; by pt fin;run;

proc means data =vol n mean median noprint;

var fin;

output out =y n=n mean =mean median =median1 ;

run;

data y;

set y;

median =round(median1,0.1);

keep median;

run;

proc sql noprint;

select median into: median

from y;

quit;

%let median = %left(%trim(&median));

%put &median.;

**create a flag var for the median as cutoff variable;

data vol;

set vol;

length flag $80;

if fin le &median then flag = "LEmedianML";

else if fin gt &median then flag ="GTmedianML";

if flag ne '';

keep pt flag;

run;

This one pt who has same fin value same as median has the GTmedianML flag

that is when I tried to create a diff variable to see the difference between the fin var and median and for this patients i am getting the diff as 2.22045E-16

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 03:14 PM

You probably either have variables with length < 8 in your permanent dataset.

Try this little program.

%let diff=2.22045E-16;

data test;

length x y 8 z 4;

do i=1 to 2 by .1 ;

x=i;

y=x+&diff;

z=x+&diff;

output;

end;

run;

data test2;

set test;

if y ne z;

run;

Then change the length of Z to 8 and run it again.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-27-2012 05:07 PM

Thanks Tom..

I tried with assigning length to 8 to the tot variable, which pulls data from the perm dataset.. It did not work..

R

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 03:59 PM

raisins25,

Most likely, it is SQL that is rounding your median value. It uses an 8-character format for translating from numeric to a character string.

Forget the SQL, forget the rounding, and forget using a macro variable. Just bring the median into your final data step:

data vol;

set vol;

if _n_=1 then set y (keep=median1);

length ...

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

06-27-2012 05:08 PM

Thanks Astounding..

I have already tried this method, trying to by pass the macro var just in case that was the problem.. Didn't work either..

R

Solution

06-27-2012
08:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to raisins25

06-27-2012 08:48 PM

Still feels like a numeric precision problem to me. While numeric vars with length <8 can make the problem more likely, it can still happen when length=8. Below is an example showing that the mean of (.1,.1,.1) is slighly greater than .1. This feels like the same problem you are running into. The link to Rick Wicklin's blog has some suggestions for introducing a fuzz factor if you need to do these sorts of comparisons.

928 data a; 929 x=.1; 930 output; 931 output; 932 output; 933 run; NOTE: The data set WORK.A has 3 observations and 1 variables. 934 935 proc means data=a mean noprint ; 936 var x; 937 output out=b mean=mean; 938 run; NOTE: There were 3 observations read from the data set WORK.A. NOTE: The data set WORK.B has 1 observations and 3 variables. 939 940 data c; 941 if _n_=1 then set b (keep=mean); 942 set a (keep=x); 943 if x<mean then type='low '; 944 else if x>mean then type='high'; 945 else if x=mean then type='mean'; 946 dif=x-mean; 947 put (x mean type dif) (=); 948 run; x=0.1 mean=0.1 type=low dif=-1.38778E-17 x=0.1 mean=0.1 type=low dif=-1.38778E-17 x=0.1 mean=0.1 type=low dif=-1.38778E-17 NOTE: There were 1 observations read from the data set WORK.B. NOTE: There were 3 observations read from the data set WORK.A. NOTE: The data set WORK.C has 3 observations and 4 variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Quentin

06-28-2012 12:53 PM

Quentin,

You are right! I introduced the fuzz factor method and it worked!

This is what i did

I also added the medina as a var than as a macro var..

data vol_;

set vol_;

length flag $80;

eps = constant("SQRTMACEPS");

value =median + eps;

if fin lt value then flag = "LEmedianML";

else flag ="GTmedianML";

keep pt flag;

run;

Thank you all for your inputs!!