I created a dataset that I am attempting to show using a gchart. When I use the dataset with gchart I receive a Segmentation Violation In Task [ GCHART ( ]. When I use create the dataset manually using datalines it works. Could the data type be causing the error? Above is the datalines statement that I created by manually copying the output of proc print. Below that is the creation of the dataset. Then the gchart output.
data genderData;
input gender $ 1-12 four_digit_year space percentDisplay ;
datalines;
Female 2010 19.8976 0.58746
Male 2010 19.6012 0.40722
Not Reported 2010 70.3704 0.00532
Female 2011 20.5422 0.57541
Male 2011 21.5277 0.42432
Not Reported 2011 3.7037 0.00027
Female 2012 19.6701 0.58254
Male 2012 20.0189 0.41718
Not Reported 2012 3.7037 0.00028
Female 2013 19.8218 0.58984
Male 2013 19.5204 0.40874
Not Reported 2013 18.5185 0.00141
Female 2014 20.0683 0.59583
Male 2014 19.3318 0.40388
Not Reported 2014 3.7037 0.00028
;
goptions reset=all hsize=5in vsize=5in htext=8pt;
proc tabulate data=enrollment missing out=genderData2;
class four_digit_year gender;
table four_digit_year, gender*rowpctn gender*colpctn;
run;
data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);
set genderData2;
pctn_10 = pctn_10 *.01;
rename pctn_10 = percentDisplay;
where pctN_01 = .;
run;
data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);
set genderData2;
rename pctN_01 = space;
where pctN_10 = .;
run;
data genderData2; merge firstColumnSet secondColumnSet;
by four_digit_year gender;
run;
proc sort data=genderData2;
by gender four_digit_year;
run;
/* Calculate the center location for each subgroup. */
/* Store the value in a Middle variable. */
data middle;
set genderData2;
by gender four_digit_year;
retain temp_sum;
if first.gender then temp_sum=space;
else temp_sum + space;
middle=(space/2)+lag(temp_sum);
if first.gender then middle=space/2;
run;
/*tested up to*/
data barlabel;
retain color 'black' when 'a' xsys ysys "2" position "+";
set middle;
midpoint=gender;
x=middle;
subgroup=four_digit_year;
text=left(put(percentDisplay,percent10.));
/*end;*/
run;
title "Gender TESTING TESTING";
axis1 label=none major=none minor=none style=0
value=none;
axis2 label=none;
proc gchart data=genderData2;
hbar gender / type=sum
sumvar=space
subgroup=four_digit_year
raxis=axis1
maxis=axis2
annotate=barlabel nostats noframe;
run;
quit;
Reply from SAS technical support.
"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4. You can do this with a DATA step like the following (which also removes the format from these two variables):
data genderData2;
length gender $12 four_digit_year $4;
set genderData2;
format gender four_digit_year;
run; "
I would say to run Proc Compare against the data set built with datalines and the Proc Tabulate output. at least for the variables you are attempting to chart.
Since there isn't any example data for the TABULATE input that's about as far as I can suggest at this time.
I used proc compare, it appears that there is a difference in:
I added in this
set genderData2;
rename pctN_01 = space;
space = round(space,.00001);
where pctN_10 = .;
run;
yet proc compare still shows:
Value Comparison Results for Variables
Its as if the tabulated dataset still has digits after the 5th decimal place even though i'm rounding.
__________________________________________________________
|| Base Compare
Obs || space space Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 19.8976 19.8976 0.0000206 0.000104
2 || 20.5422 20.5422 0.0000315 0.000153
3 || 19.6701 19.6701 0.0000109 0.0000555
4 || 19.8218 19.8218 -0.000016 -0.000080
5 || 20.0683 20.0683 -0.000047 -0.000235
6 || 19.6012 19.6012 0.0000394 0.000201
7 || 21.5277 21.5277 -0.000016 -0.000073
8 || 20.0189 20.0189 -0.000040 -0.000198
9 || 19.5204 19.5204 9.5379E-6 0.0000489
10 || 19.3318 19.3318 6.5472E-6 0.0000339
11 || 70.3704 70.3704 -0.000030 -0.000042
12 || 3.7037 3.7037 3.7037E-6 0.000100
13 || 3.7037 3.7037 3.7037E-6 0.000100
14 || 18.5185 18.5185 0.0000185 0.000100
15 || 3.7037 3.7037 3.7037E-6 0.000100
It could be this as well.
I'm trying to understand what the difference is here: I'm not sure what a format type is, I did not use a format statement. I changed the manual data step to:
/*new hard input*/
data genderData;
length gender $63 four_digit_year $255 space 8;
input gender $ 1-12 four_digit_year space percentDisplay ;
/*length gender $63 four_digit_year $255;*/
datalines;
Female 2010 19.8976 0.58746
Male 2010 19.6012 0.40722
Not Reported 2010 70.3704 0.00532
Female 2011 20.5422 0.57541
Male 2011 21.5277 0.42432
Not Reported 2011 3.7037 0.00027
Female 2012 19.6701 0.58254
Male 2012 20.0189 0.41718
Not Reported 2012 3.7037 0.00028
Female 2013 19.8218 0.58984
Male 2013 19.5204 0.40874
Not Reported 2013 18.5185 0.00141
Female 2014 20.0683 0.59583
Male 2014 19.3318 0.40388
Not Reported 2014 3.7037 0.00028
;
Proc compare returns:
Listing of Common Variables with Differing Attributes
Variable Dataset Type Length Format Label
gender WORK.GENDERDATA Char 63
WORK.GENDERDATA2 Char 63 $63. GENDER
four_digit_year WORK.GENDERDATA Char 255
WORK.GENDERDATA2 Char 255 $255. FOUR_DIGIT_YEAR
I usually post-process tabulate output datasets to get stuff I want as things like the default text lengths, formats, labels and such usually leave some thing to be desired.
All SAS variables in a data set have a format. If you don't assign one and the variable is character SAS will default to $xxx where xxx is length. If numeric then generally best16. or similar.
Is there an article on post processing tabulate output, so I can see what you mean?
No specific article, just data step code. Such things as selecting from specific tables or types, combining variables to create a single variable with row text (catx function with option missing=' ') or any numeric calculation on the statistics such as rescaling or ranking and sometimes a sort to get in a desired order.
If space contains numbers like 19.8976 in the dynamic dataset using the length statement below the decimals should be chopped off at four decimals.
Yet the compare displays:
|| Base Compare
Obs || space space Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 19.8976 19.8976 0.0000206 0.000104
data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);
set genderData2;
length pctN_01 7 space 7;
rename pctN_01 = space;
space = round(space,.0001);
where pctN_10 = .;
run;
Length for numeric has to do with the number of bytes used for storage, not the contents.
Take a look at the values below, all calculated the same but stored in different numbers of bytes.
data junk;
length x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
array j x:;
do over j;
j = 1/7;
end;
run;
proc print data=junk noobs;
var x:;
format x: best16.;
run;
Note that while the same to some number of decimals they change actual stored values.
Very many decimals will yield a small difference when stored with different precision. Generally using a length less than 8 for numeric values should only be done with integers and even then you can get surprises.
Am I reading this correctly as both GenderData.space and GenderData2.space are both Num 8? I’m trying to eliminate the variable that they are different data types. I think there the same datatype but after changing the datatype post the tabulate it seems that the dynamic column space still has a different value than the hard coded space. I figure that if I can make the two datasets identical than the dynamic dataset will not produce the segmentation error.
The COMPARE Procedure
Comparison of WORK.GENDERDATA with WORK.GENDERDATA2
(Method=EXACT)
Variables with Unequal Values
Variable Type Len Ndif MaxDif
space NUM 8 15 0.00005
Value Comparison Results for Variables
__________________________________________________________
|| Base Compare
Obs || space space Diff. % Diff
________ || _________ _________ _________ _________
||
1 || 19.8976 19.8976 0.0000206 0.000104
2 || 20.5422 20.5422 0.0000315 0.000153
3 || 19.6701 19.6701 0.0000109 0.0000555
4 || 19.8218 19.8218 -0.000016 -0.000080
5 || 20.0683 20.0683 -0.000047 -0.000235
6 || 19.6012 19.6012 0.0000394 0.000201
7 || 21.5277 21.5277 -0.000016 -0.000073
8 || 20.0189 20.0189 -0.000040 -0.000198
9 || 19.5204 19.5204 9.5379E-6 0.0000489
10 || 19.3318 19.3318 6.5472E-6 0.0000339
11 || 70.3704 70.3704 -0.000030 -0.000042
12 || 3.7037 3.7037 3.7037E-6 0.000100
13 || 3.7037 3.7037 3.7037E-6 0.000100
14 || 18.5185 18.5185 0.0000185 0.000100
15 || 3.7037 3.7037 3.7037E-6 0.000100
Can you provide an example ENROLLMENT data set so we can run the proc tabulate and see what happens. I would expect there to be differences in the calculated percentages from tabulate in most realistic input data and the displayed values are rounded/truncated using the display format but the output dataset has additional decimal values beyond the displayed range.
Ballardw,
Functionally the enrollment data that the tabulate uses is just columns gender and year with one row per student. The actual table has many columns and is rather large. I posted a simplified dataset. What confuses me is why this works with fake data and not with real data.
data genderDataInput;
length gender $63 four_digit_year $255;
input gender $ 1-12 four_digit_year;
datalines;
Female 2010
Female 2010
Female 2010
Female 2010
Male 2010
Not Reported 2010
Not Reported 2010
Not Reported 2010
Female 2011
Male 2011
Not Reported 2011
Not Reported 2011
Female 2012
Male 2012
Not Reported 2012
Female 2013
Female 2013
Female 2013
Male 2013
Not Reported 2013
Female 2014
Male 2014
Male 2014
Not Reported 2014
;
%macro test();
goptions reset=all hsize=5in vsize=5in htext=8pt;
proc tabulate data=genderDataInput /*missing*/ out=genderData2;
class four_digit_year gender;
table four_digit_year, gender*rowpctn gender*colpctn;
run;
data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);
set genderData2;
pctn_10 = pctn_10 *.01;
pctn_10 = round(pctn_10, .00001);
rename pctn_10 = percentDisplay;
where pctN_01 = .;
run;
data secondColumnSet (drop = _type_ _page_ _table_ pctn_10);
length pctN_01 8 space 8;
set genderData2;
rename pctN_01 = space;
where pctn_10 = .;
format space 8.4;
run;
data genderData2; merge secondColumnSet firstColumnSet;
by four_digit_year gender;
run;
proc sort data=genderData2;
by gender four_digit_year;
run;
/* Calculate the center location for each subgroup. */
/* Store the value in a Middle variable. */
data middle2;
set genderData2;
by GENDER FOUR_DIGIT_YEAR;
retain temp_sum;
if first.GENDER then temp_sum=space;
else temp_sum + space;
middle=(space/2)+lag(temp_sum);
if first.GENDER then middle=space/2;
run;
/*tested up to*/
data barlabel2;
retain color 'black' when 'a' xsys ysys "2" position "+";
set middle2;
midpoint=GENDER;
x=middle;
subgroup=FOUR_DIGIT_YEAR;
text=left(put(percentDisplay,percent10.));
run;
title "Gender TESTING TESTING";
axis1 label=none major=none minor=none style=0
value=none;
axis2 label=none;
proc gchart data=genderData2;
hbar GENDER / type=sum
sumvar=space
subgroup=FOUR_DIGIT_YEAR
raxis=axis1
maxis=axis2
annotate=barlabel2 nostats noframe;
run;
quit;
%mend test();
%test();
Reply from SAS technical support.
"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4. You can do this with a DATA step like the following (which also removes the format from these two variables):
data genderData2;
length gender $12 four_digit_year $4;
set genderData2;
format gender four_digit_year;
run; "
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.