Solved: Gchart Works with Datalines but not Tabulate

DavidPhillips2 · Posted 08-11-2015 10:58 AM

I created a dataset that I am attempting to show using a gchart. When I use the dataset with gchart I receive a Segmentation Violation In Task [ GCHART ( ]. When I use create the dataset manually using datalines it works. Could the data type be causing the error? Above is the datalines statement that I created by manually copying the output of proc print. Below that is the creation of the dataset. Then the gchart output.

data genderData;

input gender $ 1-12 four_digit_year space percentDisplay ;

datalines;

Female 2010 19.8976 0.58746

Male 2010 19.6012 0.40722

Not Reported 2010 70.3704 0.00532

Female 2011 20.5422 0.57541

Male 2011 21.5277 0.42432

Not Reported 2011 3.7037 0.00027

Female 2012 19.6701 0.58254

Male 2012 20.0189 0.41718

Not Reported 2012 3.7037 0.00028

Female 2013 19.8218 0.58984

Male 2013 19.5204 0.40874

Not Reported 2013 18.5185 0.00141

Female 2014 20.0683 0.59583

Male 2014 19.3318 0.40388

Not Reported 2014 3.7037 0.00028

;

goptions reset=all hsize=5in vsize=5in htext=8pt;

proc tabulate data=enrollment missing out=genderData2;

class four_digit_year gender;

table four_digit_year, gender*rowpctn gender*colpctn;

run;

data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);

set genderData2;

pctn_10 = pctn_10 *.01;

rename pctn_10 = percentDisplay;

where pctN_01 = .;

run;

data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

set genderData2;

rename pctN_01 = space;

where pctN_10 = .;

run;

data genderData2; merge firstColumnSet secondColumnSet;

by four_digit_year gender;

run;

proc sort data=genderData2;

by gender four_digit_year;

run;

/* Calculate the center location for each subgroup. */

/* Store the value in a Middle variable. */

data middle;

set genderData2;

by gender four_digit_year;

retain temp_sum;

if first.gender then temp_sum=space;

else temp_sum + space;

middle=(space/2)+lag(temp_sum);

if first.gender then middle=space/2;

run;

/*tested up to*/

data barlabel;

retain color 'black' when 'a' xsys ysys "2" position "+";

set middle;

midpoint=gender;

x=middle;

subgroup=four_digit_year;

text=left(put(percentDisplay,percent10.));

/*end;*/

run;

title "Gender TESTING TESTING";

axis1 label=none major=none minor=none style=0

value=none;

axis2 label=none;

proc gchart data=genderData2;

hbar gender / type=sum

sumvar=space

subgroup=four_digit_year

raxis=axis1

maxis=axis2

annotate=barlabel nostats noframe;

run;

quit;

DavidPhillips2 · Posted 08-13-2015 02:55 PM

Reply from SAS technical support.

"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4. You can do this with a DATA step like the following (which also removes the format from these two variables):

data genderData2;

length gender $12 four_digit_year $4;

set genderData2;

format gender four_digit_year;

run; "

View solution in original post

ballardw · Posted 08-11-2015 11:16 AM

I would say to run Proc Compare against the data set built with datalines and the Proc Tabulate output. at least for the variables you are attempting to chart.

Since there isn't any example data for the TABULATE input that's about as far as I can suggest at this time.

DavidPhillips2 · Posted 08-11-2015 01:58 PM

I used proc compare, it appears that there is a difference in:

I added in this

data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

set genderData2;

rename pctN_01 = space;

space = round(space,.00001);

where pctN_10 = .;

run;

yet proc compare still shows:

Value Comparison Results for Variables

Its as if the tabulated dataset still has digits after the 5th decimal place even though i'm rounding.

__________________________________________________________

|| Base Compare

Obs || space space Diff. % Diff

________ || _________ _________ _________ _________

||

1 || 19.8976 19.8976 0.0000206 0.000104

2 || 20.5422 20.5422 0.0000315 0.000153

3 || 19.6701 19.6701 0.0000109 0.0000555

4 || 19.8218 19.8218 -0.000016 -0.000080

5 || 20.0683 20.0683 -0.000047 -0.000235

6 || 19.6012 19.6012 0.0000394 0.000201

7 || 21.5277 21.5277 -0.000016 -0.000073

8 || 20.0189 20.0189 -0.000040 -0.000198

9 || 19.5204 19.5204 9.5379E-6 0.0000489

10 || 19.3318 19.3318 6.5472E-6 0.0000339

11 || 70.3704 70.3704 -0.000030 -0.000042

12 || 3.7037 3.7037 3.7037E-6 0.000100

13 || 3.7037 3.7037 3.7037E-6 0.000100

14 || 18.5185 18.5185 0.0000185 0.000100

15 || 3.7037 3.7037 3.7037E-6 0.000100

DavidPhillips2 · Posted 08-11-2015 02:22 PM

It could be this as well.

I'm trying to understand what the difference is here: I'm not sure what a format type is, I did not use a format statement. I changed the manual data step to:

/*new hard input*/

data genderData;

length gender $63 four_digit_year $255 space 8;

input gender $ 1-12 four_digit_year space percentDisplay ;

/*length gender $63 four_digit_year $255;*/

datalines;

Female 2010 19.8976 0.58746

Male 2010 19.6012 0.40722

Not Reported 2010 70.3704 0.00532

Female 2011 20.5422 0.57541

Male 2011 21.5277 0.42432

Not Reported 2011 3.7037 0.00027

Female 2012 19.6701 0.58254

Male 2012 20.0189 0.41718

Not Reported 2012 3.7037 0.00028

Female 2013 19.8218 0.58984

Male 2013 19.5204 0.40874

Not Reported 2013 18.5185 0.00141

Female 2014 20.0683 0.59583

Male 2014 19.3318 0.40388

Not Reported 2014 3.7037 0.00028

;

Proc compare returns:

Listing of Common Variables with Differing Attributes

Variable Dataset Type Length Format Label

gender WORK.GENDERDATA Char 63

WORK.GENDERDATA2 Char 63 $63. GENDER

four_digit_year WORK.GENDERDATA Char 255

WORK.GENDERDATA2 Char 255 $255. FOUR_DIGIT_YEAR

ballardw · Posted 08-11-2015 03:19 PM

I usually post-process tabulate output datasets to get stuff I want as things like the default text lengths, formats, labels and such usually leave some thing to be desired.

All SAS variables in a data set have a format. If you don't assign one and the variable is character SAS will default to $xxx where xxx is length. If numeric then generally best16. or similar.

DavidPhillips2 · Posted 08-11-2015 03:21 PM

Is there an article on post processing tabulate output, so I can see what you mean?

ballardw · Posted 08-11-2015 03:29 PM

No specific article, just data step code. Such things as selecting from specific tables or types, combining variables to create a single variable with row text (catx function with option missing=' ') or any numeric calculation on the statistics such as rescaling or ranking and sometimes a sort to get in a desired order.

DavidPhillips2 · Posted 08-11-2015 03:46 PM

If space contains numbers like 19.8976 in the dynamic dataset using the length statement below the decimals should be chopped off at four decimals.

Yet the compare displays:

|| Base Compare

Obs || space space Diff. % Diff

________ || _________ _________ _________ _________

||

1 || 19.8976 19.8976 0.0000206 0.000104

data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

set genderData2;

length pctN_01 7 space 7;

rename pctN_01 = space;

space = round(space,.0001);

where pctN_10 = .;

run;

ballardw · Posted 08-11-2015 06:13 PM

Length for numeric has to do with the number of bytes used for storage, not the contents.

Take a look at the values below, all calculated the same but stored in different numbers of bytes.

data junk;
   length x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
   array j x:;
   do over j;
      j = 1/7;
   end;
run;

proc print data=junk noobs;
var x:;
format x: best16.;
run;

Note that while the same to some number of decimals they change actual stored values.

Very many decimals will yield a small difference when stored with different precision. Generally using a length less than 8 for numeric values should only be done with integers and even then you can get surprises.

DavidPhillips2 · Posted 08-12-2015 09:25 AM

Am I reading this correctly as both GenderData.space and GenderData2.space are both Num 8? I’m trying to eliminate the variable that they are different data types. I think there the same datatype but after changing the datatype post the tabulate it seems that the dynamic column space still has a different value than the hard coded space. I figure that if I can make the two datasets identical than the dynamic dataset will not produce the segmentation error.

The COMPARE Procedure

Comparison of WORK.GENDERDATA with WORK.GENDERDATA2

(Method=EXACT)

Variables with Unequal Values

Variable Type Len Ndif MaxDif

space NUM 8 15 0.00005

Value Comparison Results for Variables

__________________________________________________________

|| Base Compare

Obs || space space Diff. % Diff

________ || _________ _________ _________ _________

||

1 || 19.8976 19.8976 0.0000206 0.000104

2 || 20.5422 20.5422 0.0000315 0.000153

3 || 19.6701 19.6701 0.0000109 0.0000555

4 || 19.8218 19.8218 -0.000016 -0.000080

5 || 20.0683 20.0683 -0.000047 -0.000235

6 || 19.6012 19.6012 0.0000394 0.000201

7 || 21.5277 21.5277 -0.000016 -0.000073

8 || 20.0189 20.0189 -0.000040 -0.000198

9 || 19.5204 19.5204 9.5379E-6 0.0000489

10 || 19.3318 19.3318 6.5472E-6 0.0000339

11 || 70.3704 70.3704 -0.000030 -0.000042

12 || 3.7037 3.7037 3.7037E-6 0.000100

13 || 3.7037 3.7037 3.7037E-6 0.000100

14 || 18.5185 18.5185 0.0000185 0.000100

15 || 3.7037 3.7037 3.7037E-6 0.000100

ballardw · Posted 08-12-2015 03:37 PM

Can you provide an example ENROLLMENT data set so we can run the proc tabulate and see what happens. I would expect there to be differences in the calculated percentages from tabulate in most realistic input data and the displayed values are rounded/truncated using the display format but the output dataset has additional decimal values beyond the displayed range.

DavidPhillips2 · Posted 08-12-2015 04:49 PM

Ballardw,

Functionally the enrollment data that the tabulate uses is just columns gender and year with one row per student. The actual table has many columns and is rather large. I posted a simplified dataset. What confuses me is why this works with fake data and not with real data.

data genderDataInput;

length gender $63 four_digit_year $255;

input gender $ 1-12 four_digit_year;

datalines;

Female 2010

Male 2010

Not Reported 2010

Female 2011

Male 2011

Not Reported 2011

Female 2012

Male 2012

Not Reported 2012

Female 2013

Male 2013

Not Reported 2013

Female 2014

Male 2014

Not Reported 2014

;

%macro test();

goptions reset=all hsize=5in vsize=5in htext=8pt;

proc tabulate data=genderDataInput /*missing*/ out=genderData2;

class four_digit_year gender;

table four_digit_year, gender*rowpctn gender*colpctn;

run;

data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);

set genderData2;

pctn_10 = pctn_10 *.01;

pctn_10 = round(pctn_10, .00001);

rename pctn_10 = percentDisplay;

where pctN_01 = .;

run;

data secondColumnSet (drop = _type_ _page_ _table_ pctn_10);

length pctN_01 8 space 8;

set genderData2;

rename pctN_01 = space;

where pctn_10 = .;

format space 8.4;

run;

data genderData2; merge secondColumnSet firstColumnSet;

by four_digit_year gender;

run;

proc sort data=genderData2;

by gender four_digit_year;

run;

/* Calculate the center location for each subgroup. */

/* Store the value in a Middle variable. */

data middle2;

set genderData2;

by GENDER FOUR_DIGIT_YEAR;

retain temp_sum;

if first.GENDER then temp_sum=space;

else temp_sum + space;

middle=(space/2)+lag(temp_sum);

if first.GENDER then middle=space/2;

run;

/*tested up to*/

data barlabel2;

retain color 'black' when 'a' xsys ysys "2" position "+";

set middle2;

midpoint=GENDER;

x=middle;

subgroup=FOUR_DIGIT_YEAR;

text=left(put(percentDisplay,percent10.));

run;

title "Gender TESTING TESTING";

axis1 label=none major=none minor=none style=0

value=none;

axis2 label=none;

proc gchart data=genderData2;

hbar GENDER / type=sum

sumvar=space

subgroup=FOUR_DIGIT_YEAR

raxis=axis1

maxis=axis2

annotate=barlabel2 nostats noframe;

run;

quit;

%mend test();

%test();

DavidPhillips2 · Posted 08-13-2015 02:55 PM

Reply from SAS technical support.

"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4. You can do this with a DATA step like the following (which also removes the format from these two variables):

data genderData2;

length gender $12 four_digit_year $4;

set genderData2;

format gender four_digit_year;

run; "

Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Re: Gchart Works with Datalines but not Tabulate

Registration is open

SAS Training: Just a Click Away