BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidPhillips2
Rhodochrosite | Level 12

I created a dataset that I am attempting to show using a gchart.  When I use the dataset with gchart I receive a Segmentation Violation In Task [ GCHART ( ].  When I use create the dataset manually using datalines it works.  Could the data type be causing the error?  Above is the datalines statement that I created by manually copying the output of proc print.  Below that is the creation of the dataset.  Then the gchart output.

data genderData;

input gender $ 1-12 four_digit_year space percentDisplay ;

datalines;

Female       2010 19.8976 0.58746

Male         2010 19.6012 0.40722

Not Reported 2010 70.3704 0.00532

Female       2011 20.5422 0.57541

Male         2011 21.5277 0.42432

Not Reported 2011 3.7037  0.00027

Female       2012 19.6701 0.58254

Male         2012 20.0189 0.41718

Not Reported 2012 3.7037  0.00028

Female       2013 19.8218 0.58984

Male         2013 19.5204 0.40874

Not Reported 2013 18.5185 0.00141

Female       2014 20.0683 0.59583

Male         2014 19.3318 0.40388

Not Reported 2014 3.7037  0.00028

;

goptions reset=all hsize=5in vsize=5in htext=8pt;

proc tabulate data=enrollment missing out=genderData2;

  class four_digit_year gender;

  table four_digit_year, gender*rowpctn gender*colpctn;

  run;

  data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);

  set genderData2;

  pctn_10 = pctn_10 *.01;

  rename pctn_10 = percentDisplay;

  where pctN_01 = .;

  run;

  data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

  set genderData2;

  rename pctN_01 = space;

  where pctN_10 = .;

  run;

  data genderData2; merge firstColumnSet secondColumnSet;

  by four_digit_year gender;

  run;

  proc sort data=genderData2;                                                                                            

    by gender four_digit_year;                                                                                                                 

  run;

  /* Calculate the center location for each subgroup. */                                                                                 

  /* Store the value in a Middle variable. */                                                                                            

  data middle;                                                                                                                           

   set genderData2;                                                                                                                        

   by gender four_digit_year;                                                                                                                  

   retain temp_sum;                                                                                                                     

   if first.gender then temp_sum=space;                                                                                           

   else temp_sum + space;                                                                                                          

   middle=(space/2)+lag(temp_sum);                                                                                                 

   if first.gender then middle=space/2;

  run;

  /*tested up to*/

  data barlabel;

    retain color 'black' when 'a' xsys ysys "2" position "+";

    set middle;

    midpoint=gender;

    x=middle;

    subgroup=four_digit_year;

  text=left(put(percentDisplay,percent10.));

    /*end;*/

  run;

  title "Gender TESTING TESTING";

  axis1 label=none major=none minor=none style=0

       value=none;

  axis2 label=none;

  proc gchart data=genderData2;

    hbar gender / type=sum

                sumvar=space

                subgroup=four_digit_year

                raxis=axis1

                maxis=axis2

                annotate=barlabel nostats noframe;

  run;

  quit;

1 ACCEPTED SOLUTION

Accepted Solutions
DavidPhillips2
Rhodochrosite | Level 12

Reply from SAS technical support.

"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4.  You can do this with a DATA step like the following (which also removes the format from these two variables):

  data genderData2;                                                                                                                      

   length gender $12 four_digit_year $4;                                                                                                

   set genderData2;                                                                                                                     

   format gender four_digit_year;                                                                                                       

  run;   "

View solution in original post

12 REPLIES 12
ballardw
Super User

I would say to run Proc Compare against the data set built with datalines and the Proc Tabulate output. at least for the variables you are attempting to chart.

Since there isn't any example data for the TABULATE input that's about as far as I can suggest at this time.

DavidPhillips2
Rhodochrosite | Level 12

I used proc compare, it appears that there is a difference in:

I added in this

data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

set genderData2;

rename pctN_01 = space;

space = round(space,.00001);

where pctN_10 = .;

run;

yet proc compare still shows:

Value Comparison Results for Variables 

Its as if the tabulated dataset still has digits after the 5th decimal place even though i'm rounding.

  

  __________________________________________________________ 

  || Base Compare 

  Obs || space space Diff. % Diff 

  ________ || _________ _________ _________ _________ 

  || 

  1 || 19.8976 19.8976 0.0000206 0.000104 

  2 || 20.5422 20.5422 0.0000315 0.000153 

  3 || 19.6701 19.6701 0.0000109 0.0000555 

  4 || 19.8218 19.8218 -0.000016 -0.000080 

  5 || 20.0683 20.0683 -0.000047 -0.000235 

  6 || 19.6012 19.6012 0.0000394 0.000201 

  7 || 21.5277 21.5277 -0.000016 -0.000073 

  8 || 20.0189 20.0189 -0.000040 -0.000198 

  9 || 19.5204 19.5204 9.5379E-6 0.0000489 

  10 || 19.3318 19.3318 6.5472E-6 0.0000339 

  11 || 70.3704 70.3704 -0.000030 -0.000042 

  12 || 3.7037 3.7037 3.7037E-6 0.000100 

  13 || 3.7037 3.7037 3.7037E-6 0.000100 

  14 || 18.5185 18.5185 0.0000185 0.000100 

  15 || 3.7037 3.7037 3.7037E-6 0.000100

DavidPhillips2
Rhodochrosite | Level 12

It could be this as well.

I'm trying to understand what the difference is here:  I'm not sure what a format type is, I did not use a format statement.  I changed the manual data step to:

/*new hard input*/

data genderData;

length gender $63 four_digit_year $255 space 8;

input gender $ 1-12 four_digit_year space percentDisplay ;

/*length gender $63 four_digit_year $255;*/

datalines;

Female       2010 19.8976 0.58746

Male         2010 19.6012 0.40722

Not Reported 2010 70.3704 0.00532

Female       2011 20.5422 0.57541

Male         2011 21.5277 0.42432

Not Reported 2011 3.7037  0.00027

Female       2012 19.6701 0.58254

Male         2012 20.0189 0.41718

Not Reported 2012 3.7037  0.00028

Female       2013 19.8218 0.58984

Male         2013 19.5204 0.40874

Not Reported 2013 18.5185 0.00141

Female       2014 20.0683 0.59583

Male         2014 19.3318 0.40388

Not Reported 2014 3.7037  0.00028

;

Proc compare returns:

Listing of Common Variables with Differing Attributes 

  

  Variable Dataset Type Length Format Label 

  

  gender WORK.GENDERDATA Char 63 

  WORK.GENDERDATA2 Char 63 $63. GENDER 

  four_digit_year WORK.GENDERDATA Char 255 

  WORK.GENDERDATA2 Char 255 $255. FOUR_DIGIT_YEAR

ballardw
Super User

I usually post-process tabulate output datasets to get stuff I want as things like the default text lengths, formats, labels and such usually leave some thing to be desired.

All SAS variables in a data set have a format. If you don't assign one and the variable is character SAS will default to $xxx where xxx is length. If numeric then generally best16. or similar.

DavidPhillips2
Rhodochrosite | Level 12

Is there an article on post processing tabulate output, so I can see what you mean?

ballardw
Super User

No specific article, just data step code. Such things as selecting from specific tables or types, combining variables to create a single variable with row text (catx function with option missing=' ') or any numeric calculation on the statistics such as rescaling or ranking and sometimes a sort to get in a desired order.

DavidPhillips2
Rhodochrosite | Level 12

If space contains numbers like 19.8976 in the dynamic dataset using the length statement below the decimals should be chopped off at four decimals.

Yet the compare displays:

|| Base Compare 

  Obs || space space Diff. % Diff 

  ________ || _________ _________ _________ _________ 

  || 

  1 || 19.8976 19.8976 0.0000206 0.000104

data secondColumnSet (drop = _type_ _page_ _table_ pctN_10);

  set genderData2;

  length pctN_01 7 space 7;

  rename pctN_01 = space;

  space = round(space,.0001);

  where pctN_10 = .;

  run;

ballardw
Super User

Length for numeric has to do with the number of bytes used for storage, not the contents.

Take a look at the values below, all calculated the same but stored in different numbers of bytes.

data junk;
   length x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
   array j x:;
   do over j;
      j = 1/7;
   end;
run;

proc print data=junk noobs;
var x:;
format x: best16.;
run;

Note that while the same to some number of decimals they change actual stored values.

Very many decimals will yield a small difference when stored with different precision. Generally using a length less than 8 for numeric values should only be done with integers and even then you can get surprises.

DavidPhillips2
Rhodochrosite | Level 12

Am I reading this correctly as both GenderData.space and GenderData2.space are both Num 8?  I’m trying to eliminate the variable that they are different data types. I think there the same datatype but after changing the datatype post the tabulate it seems that the dynamic column space still has a different value than the hard coded space.  I figure that if I can make the two datasets identical than the dynamic dataset will not produce the segmentation error.

The COMPARE Procedure 

  Comparison of WORK.GENDERDATA with WORK.GENDERDATA2 

  (Method=EXACT) 

  

  Variables with Unequal Values 

  

  Variable Type Len Ndif MaxDif 

  

  space NUM 8 15 0.00005

Value Comparison Results for Variables 

  

  __________________________________________________________ 

  || Base Compare 

  Obs || space space Diff. % Diff 

  ________ || _________ _________ _________ _________ 

  || 

  1 || 19.8976 19.8976 0.0000206 0.000104 

  2 || 20.5422 20.5422 0.0000315 0.000153 

  3 || 19.6701 19.6701 0.0000109 0.0000555 

  4 || 19.8218 19.8218 -0.000016 -0.000080 

  5 || 20.0683 20.0683 -0.000047 -0.000235 

  6 || 19.6012 19.6012 0.0000394 0.000201 

  7 || 21.5277 21.5277 -0.000016 -0.000073 

  8 || 20.0189 20.0189 -0.000040 -0.000198 

  9 || 19.5204 19.5204 9.5379E-6 0.0000489 

  10 || 19.3318 19.3318 6.5472E-6 0.0000339 

  11 || 70.3704 70.3704 -0.000030 -0.000042 

  12 || 3.7037 3.7037 3.7037E-6 0.000100 

  13 || 3.7037 3.7037 3.7037E-6 0.000100 

  14 || 18.5185 18.5185 0.0000185 0.000100 

  15 || 3.7037 3.7037 3.7037E-6 0.000100

ballardw
Super User

Can you provide an example ENROLLMENT data set so we can run the proc tabulate and see what happens. I would expect there to be differences in the calculated percentages from tabulate in most realistic input data and the displayed values are rounded/truncated using the display format but the output dataset has additional decimal values beyond the displayed range.

DavidPhillips2
Rhodochrosite | Level 12

Ballardw,

Functionally the enrollment data that the tabulate uses is just columns gender and year with one row per student.  The actual table has many columns and is rather large.  I posted a simplified dataset.  What confuses me is why this works with fake data and not with real data.

data genderDataInput;

length gender $63 four_digit_year $255;

input gender $ 1-12 four_digit_year;

datalines;

Female       2010

Female       2010

Female       2010

Female       2010

Male         2010

Not Reported 2010

Not Reported 2010

Not Reported 2010

Female       2011

Male         2011

Not Reported 2011

Not Reported 2011

Female       2012

Male         2012

Not Reported 2012

Female       2013

Female       2013

Female       2013

Male         2013

Not Reported 2013

Female       2014

Male         2014

Male         2014

Not Reported 2014

;

%macro test();

goptions reset=all hsize=5in vsize=5in htext=8pt;

  proc tabulate data=genderDataInput /*missing*/ out=genderData2;

  class four_digit_year gender;

  table four_digit_year, gender*rowpctn gender*colpctn;

  run;

  data firstColumnSet (drop = _type_ _page_ _table_ pctN_01);

  set genderData2;

  pctn_10 = pctn_10 *.01;

  pctn_10 = round(pctn_10, .00001);

  rename pctn_10 = percentDisplay;

  where pctN_01 = .;

  run;

  data secondColumnSet (drop = _type_ _page_ _table_ pctn_10);

  length pctN_01 8 space 8;

  set genderData2;

  rename pctN_01 = space;

  where pctn_10 = .;

  format space 8.4;

  run;

  data genderData2; merge secondColumnSet firstColumnSet;

  by four_digit_year gender;

  run;

  proc sort data=genderData2;                                                                                            

    by gender four_digit_year;                                                                                                                 

  run;

  /* Calculate the center location for each subgroup. */                                                                                 

  /* Store the value in a Middle variable. */                                                                                            

  data middle2;                                                                                                                           

   set genderData2;                                                                                                                        

   by GENDER FOUR_DIGIT_YEAR;                                                                                                                  

   retain temp_sum;                                                                                                                     

   if first.GENDER then temp_sum=space;                                                                                           

   else temp_sum + space;                                                                                                          

   middle=(space/2)+lag(temp_sum);                                                                                                 

   if first.GENDER then middle=space/2;

  run;

  /*tested up to*/

  data barlabel2;

    retain color 'black' when 'a' xsys ysys "2" position "+";

    set middle2;

    midpoint=GENDER;

    x=middle;

    subgroup=FOUR_DIGIT_YEAR;

  text=left(put(percentDisplay,percent10.));

  run;

  title "Gender TESTING TESTING";

  axis1 label=none major=none minor=none style=0

       value=none;

  axis2 label=none;

  proc gchart data=genderData2;

    hbar GENDER / type=sum

           sumvar=space

           subgroup=FOUR_DIGIT_YEAR

           raxis=axis1

           maxis=axis2

           annotate=barlabel2 nostats noframe;

  run;

  quit;

%mend test();

%test();

DavidPhillips2
Rhodochrosite | Level 12

Reply from SAS technical support.

"Please try modifying the data to reduce the length of Gender to $12 and Four_Digit_Year to $4.  You can do this with a DATA step like the following (which also removes the format from these two variables):

  data genderData2;                                                                                                                      

   length gender $12 four_digit_year $4;                                                                                                

   set genderData2;                                                                                                                     

   format gender four_digit_year;                                                                                                       

  run;   "

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1472 views
  • 6 likes
  • 2 in conversation