- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys, I want to create an informat.
However log keeps notifying me that ERROR: Start is greater than end: -. unless I switch the number of 100 to a word of high.Could anyone explain to me the reason of the error?
What if I want to informat numbers which are larger than 100 as "Not defined yet", how should I adjust my code and avoid the error?
Thank you soooo much!
proc format;
invalue $convert
0 - <65 = 'F'
65 -<75 = 'C'
75 - <85 = 'B'
85 - <100 = 'A'
other = 'Not defined yet ';
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi chouchou,
Others have shown that your INFORMAT is not OK for your purpose. I hope to show you what is happening behind the scenes by examples.
Your value ranges
0 - < 65 , ..... 85 - < 100
produce ERROR. Understanding the reason for this error will get you back to seeking other ways.
First let us stop this ERROR, by a small change in the first and last Range as:
proc format;
invalue $conv
0.001 - < 65 = 'F'
65 -< 75 = 'C'
75 - < 85 = 'B'
85 - 99.999 = 'A'
other = 'Not defined yet ';
run;
I hope this modification may not alter much of your specification. Run this and see that the ERROR is not there. We will see later why this happened.
Next is, how you will read X - as numeric or character from the input file ?
[1] Let X be numeric as in this input file:
data have;
infile cards truncover;
*input x $3.;
input x;
datalines;
1
2
3
50
57
58
68
69
70
71
78
79
80
82
89
90
91
96
99
100
120
;
run;
Now apply the INFORMAT on X as below:
data want;
set have;
x1 = input(x, $conv.);
run;
See the output of the data set, WANT.
proc print data = want;
run;
All your X's are transformed to 'Net defined yet' as only 'other' is applicable for all X's. Why, wait for some time.
[2] Now let X be character Variable.
In the above input file, read X as $3 instead as number and run the following code.
data want;
set have;
x1 = input(x, $conv.);
run;
and see the output of the data set WANT:
proc print data = want;
run;
You will see that the LABELS are OK except for 100 and 120 where you are supposed to get 'Not defined yet'. The following show the reasons.
proc sort data = have out = havesort;
by x;
run;
proc print data = havesort;
run;
When X is a character variable, the sorted file shows the order as 1, 100, 120, 2, 3, 50, ...., 96, 99.
Therefore INFORMAT transforms 1 as F, 100 as F, 120 as F, 2 as F, ..., 58 as F, 68 as C and so on.
The lesson is that SAS treats numeric value ranges as character ranges (by default) like '0' - < '65', ...., '85' - < '100' when INVALUE statement (invalue $convert) is made.
So you can not achieve what you want by your $CONVERT Informat.
Now coming to another way of doing what you want.
Change X to Number in the Input File.
Use the following Format:
proc format;
value numconv
0 - < 65 = 'F'
65 -< 75 = 'C'
75 - < 85 = 'B'
85 - <100 = 'A'
other = 'Not defined yet ';
run;
Now use the following code to transform X.
data want;
set have;
x1 = put(x, numconv.);
run;
proc print data = want;
run;
Message was edited by: MUTHIA KACHIRAYAN
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think it is because you are treating it as a character variable. I'd think the following might do what you want:
proc format;
value converta
1 = 'F'
2 = 'C'
3 = 'B'
4 = 'A'
5 = 'Not defined yet ';
invalue convert
0 - <65 = 1
65 -<75 = 2
75 - <85 = 3
85 - <100 = 4
other = 5;
run;
data have;
informat x convert.;
format x converta.;
input x;
cards;
1
66
76
86
100
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not just use a custom display format for the values and read as numeric? Then if there is a decision to move the boundary between 'A' and 'B' to 90 you do not have to re-read the data, just assign the new format for any analysis.
This would also handle the issue of later changes. BTW you might want 85 - 100. Unless 100 really isn't supposed to be treated as an 'A';
proc format;
value convert
0 - <65 = 'F'
65 -<75 = 'C'
75 - <85 = 'B'
85 - <100 = 'A'
other = 'Not defined yet ';
run;
If this is actually assigning class grades from numeric there is another reason to leave values as the numeric and just use a format: different classes may be graded differently, i.e. "on a curve".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One more thought .. though I can't test it at the moment. Wouldn't the following work?
proc format;
invalue grade
0 - <65 = 'F'
65 - <75 = 'C'
75 - <85 = 'B'
85 - <100 = 'A'
other = 'Not defined yet ';
run;
data have;
informat x grade.;
input x;
cards;
1
66
76
86
100
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The error comes because, SAS treats the numeric ranges as character string. In SAS Document, it says:
"In character informats, numeric ranges are treated as character strings."
Better you replace INVALUE by VALUE then your format works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you should use 'low' instead of 0. This will solve your problem. Also for example please run below code. It may help you to understand the issue.
proc format;
invalue $convertc
low - <65 = "F"
65 -<75 = "C"
75 - <85 = "B"
85 - high = "A"
other = "Not defined yet";
value convertn
low - <65 = "F"
65 -<75 = "C"
75 - <85 = "B"
85 - high = "A"
other = "Not defined yet";
run;
data aaa;
length y $10 ;
x = 0; y = '0'; output;
x = 65; y = '65'; output;
x = 72; y = '72'; output;
x = 81; y = '81'; output;
x = 88; y = '88'; output;
x = 100; y = '100'; output;
run;
data bbb;
set aaa;
zc = input(y, $convertc.);
zn = put(x, convertn.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amit, you're correct, but the problem is that I need to keep the boundary 85-<100 and leave the numbers larger than 100 as "other". Do you have any idea?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Chouchou, I am totally agreed with the point mentioned by Arthur. Please try the below code and see if this helps.
proc format;
invalue convertc
low - <65 = 1
65 - <75 = 2
75 - <85 = 3
85 - <100 = 4
other = 5;
value convertn
low - <65 = "F"
65 -<75 = "C"
75 - <85 = "B"
85 - <100 = "A"
other = "Not defined yet";
value convertx
1 = 'F'
2 = 'C'
3 = 'B'
4 = 'A'
5 = 'Not defined yet';
run;
data aaa;
length y $10 ;
x = 0; y = '0'; output;
x = 65; y = '65'; output;
x = 72; y = '72'; output;
x = 81; y = '81'; output;
x = 88; y = '88'; output;
x = 99; y = '99'; output;
x = 100; y = '100'; output;
x = 120; y = '120'; output;
run;
data bbb;
set aaa;
zc = put(input(y, convertc.), convertx.);
zn = put(x, convertn.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Within the format procedure http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#format-overview.htm
it is stated:
User-defined informats read only character data. They can convert character values into real numeric values, but they cannot convert real numbers into characters
that's the reason for the error message you got when trying to to define an informat for converting numbers into characters.
Best
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your x values are all integers, then you could use the following approach. Note, though, that your question stated that you wanted to trap numbers larger than 100, but your code looked for numbers that included 100. I did find time to test this approach:
data fmt;
retain fmtname '$convert' type 'J';
length start $8 label $15;
do _n_=1 to 101;
start=strip(put(_n_,8.));
select;
when (_n_ lt 65) label='F';
when (_n_ lt 75) label='C';
when (_n_ lt 85) label='B';
otherwise label='A';
end;
if _n_=101 then do;
start='Other';
label= 'Not defined yet';
end;
output;
end;
run;
proc format cntlin=fmt;
run;
data have;
informat x $convert15.;
input x;
cards;
1
66
76
86
100
101
102
103
221
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One last point: if you ALSO want to convert missing values to the value "Not defined yet", then the following code would be needed:
data fmt;
retain fmtname '$convert' type 'J' hlo 'UJ ';
length start $8 label $15;
do _n_=1 to 101;
start=strip(put(_n_,8.));
select;
when (_n_ lt 65) label='F';
when (_n_ lt 75) label='C';
when (_n_ lt 85) label='B';
otherwise label='A';
end;
if _n_=101 then do;
hlo = 'UJO';
start='Other';
label= 'Not defined yet';
end;
output;
end;
run;
proc format cntlin=fmt;
run;
data have;
infile cards truncover;
input x $convert15.;
cards;
1
66
76
86
100
101
102
103
221
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Arthur,
Thank you for your time and it's really a great and different way of thinking.
But I'm still struggle with the proc format; invalue..... statement (not proc format;value.....), why I set a boundary of 100 and it doesn't work. It's just too weird!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your original code didn't work correctly with or without including 99. To see what I mean, try the following and look at the results. When looking at characters, rather than numbers, the sort order is totally different:
proc format;
invalue $convert
0 - <65 = 'F'
65 -<75 = 'C'
75 - <85 = 'B'
85 - <99 = 'A'
other = 'Not defined yet ';
run;
data have;
infile cards truncover;
input x $convert15.;
cards;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The whole struggle you're facing is caused by your variable being character and not numeric but you wanting to define ranges. I'm surprised that no one suggested yet to first convert your variable into numeric and then use a numeric format as there things are really simple.
proc format;
value convert
0 - <65 = 'F'
65 -<75 = 'C'
75 - <85 = 'B'
85 - 100 = 'A'
other = 'Not defined yet ';
run;
data sample;
input var :$3.;
want=put(input(var,best32.),convert.);
datalines;
-1
0
9
10
74
75
76
99
100
101
;
run;
Defining ranges for a character format or informat works differently - and the same way like when comparing the values of two character variables. It basically compares the first character in the one variable with the first character in the second variable, then 2nd with 2nd, 3rd with 3rd ...and so on. What's smaller or greater depends on the collating sequence (and the collating sequence is partly dependent on the code page used).
http://supportline.microfocus.com/Documentation/books/rd60/lhacha01.htm
SAS(R) 9.2 National Language Support (NLS): Reference Guide
Even though you're defining a character informat SAS lets you write the strings without quotes as long as it's digits only. This is not really "clean" but SAS is generous and it's how it worked since long. That doesn't mean though that these digits are treated as numbers. They are not. They are treated as strings.
So now with 85 - <100 = 'A' it's still about strings and comparing the first character to the left with the first character to the right. The '8' of 85 is higher up in the collating sequence than the '1' from 100. That's why you're getting the error here.
Run below code and then check the SAS log. May be that will help to explain how character comparison works.
data test;
input @1 (str1 str2) (:$2.) @1 (num1 num2) (?? :8.);
if str1 > str2 then put str1= '> ' str2=;
else if str1 < str2 then put str1= '< ' str2=;
else put str1= '= ' str2=;
if num1 > num2 then put num1= '> ' num2=;
else if num1 < num2 then put num1= '< ' num2=;
else put num1= '= ' num2=;
put;
datalines;
1 1
10 1
10 2
10 85
8 75
1 a
1 A
a a
A a
A b
A B
a b
A B
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Patrick, I'm fully clear.