BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Child79
Fluorite | Level 6
data students;
input  type_college :$16.  name $   COL1:4.2 ;
datalines;
College private   delai_dec20  1.5
College private   delai_juin21  3.6
College private   delai_dec21   4.4
College private   delai_sep22   47.6
College public delai_dec20 5.3
College public delai_juin21 6
College public delai_dec21 14.4
College public delai_sep22 14.4
University delai_dec20 11.1
University private delai_juin21 14.3
University delai_dec21 14
University delai_sep22 17.1
          delai_dec20 5.6
             delai_juin21 7.1
            delai_dec21  12
            delai_sep22  18.1
;
I want to replace the blank by "Total" for type_college that's why I used proc format but it doesn't work correctly.
proc format;
value $ht
"College public" = 'College public'
"College private" ='College private'
"University" = 'University'
" " ='Total           ';
run;

But in the output type_college and name are sorted in ascending order. 
It display values of type_college in ascending order (college_private college_public university) but I want
values displayed in this order ( college public college private university).
Also I want name displayed in this order (delai_dec20 delai_juin21 delai_dec21 delai_sep22 ) but I got it in this order
(delai_dec20 delai_dec21 delai_juin21 delai_sep22 )
How can I get what I want?.
proc tabulate data=students;
var COL1;
CLASS type_college ;
CLASS name ;
/*row statement */
type_college
all= 'Total',
/* Column statement */
name * (COL1 * Sum={LABEL="Somme"});
;
run;
Thanks for helping me.


1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

First thing, please make sure that your data step to create example data creates what you expect and need to demonstrate the issue. When I run your code:

305  data students;
306  input  type_college :$16.  name $   COL1:4.2 ;
307  datalines;

NOTE: Invalid data for COL1 in line 308 19-29.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
308        College private   delai_dec20  1.5
type_college=College name=private COL1=. _ERROR_=1 _N_=1
NOTE: Invalid data for COL1 in line 309 19-30.
309        College private   delai_juin21  3.6
type_college=College name=private COL1=. _ERROR_=1 _N_=2
NOTE: Invalid data for COL1 in line 310 19-29.
310        College private   delai_dec21   4.4
type_college=College name=private COL1=. _ERROR_=1 _N_=3
NOTE: Invalid data for COL1 in line 311 19-29.
311        College private   delai_sep22   47.6
type_college=College name=private COL1=. _ERROR_=1 _N_=4
NOTE: Invalid data for COL1 in line 312 16-26.
312        College public delai_dec20 5.3
type_college=College name=public COL1=. _ERROR_=1 _N_=5
NOTE: Invalid data for COL1 in line 313 16-27.
313        College public delai_juin21 6
type_college=College name=public COL1=. _ERROR_=1 _N_=6
NOTE: Invalid data for COL1 in line 314 16-26.
314        College public delai_dec21 14.4
type_college=College name=public COL1=. _ERROR_=1 _N_=7
NOTE: Invalid data for COL1 in line 315 16-26.
315        College public delai_sep22 14.4
type_college=College name=public COL1=. _ERROR_=1 _N_=8
NOTE: Invalid data for COL1 in line 317 20-31.
317        University private delai_juin21 14.3
type_college=University name=private COL1=. _ERROR_=1 _N_=10
NOTE: Invalid data for COL1 in line 321 14-25.
321                     delai_juin21 7.1
type_college=delai_dec20 name=5.6 COL1=. _ERROR_=1 _N_=13
NOTE: Invalid data for COL1 in line 323 13-23.
323                    delai_sep22  18.1
type_college=delai_dec21 name=12 COL1=. _ERROR_=1 _N_=14
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.STUDENTS has 14 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


324  ;

What is happening is that your data is using LIST input. That means the spaces between the words of the type college are making those as 2 variable: type_college is reading College, Name as Private and then the numeric variable Col1 is attempting to read "delai_dec20" as a number.

Also, again since you are using List input, the college type would not be blank or missing as it would read the 'delai_dec20' as the type .

You also have multiple issues with the attempted informats. It appears that you expect a value of "University private" for the type variable. That length is longer than 16 characters. Name would be read as 8 characters max and your values go up to 12 characters. Lastly SAS will imply decimals when you provide a decimal value as part of the Informat. So using Col1 with a 4.2 informat means that any value of Col1 that does not have a decimal point in the value, i.e. the 12 on the 15th row would be read as value of 0.12. Generally if your numeric values do not have odd characters such as currency symbols or commas there is seldom a need to provide an informat for them and extremely rare to provide a decimal in the informat unless you are reading ancient files that use implied decimals to same space as was done on punch cards.

So this is a data set that addresses the issues:

data students;
  infile datalines dlm=',' dsd;
  informat type_college $19.  name $15. ; 
input  type_college   name $  COL1 ;
datalines;
College private,delai_dec20,1.5
College private,delai_juin21,3.6
College private,delai_dec21,4.4
College private,delai_sep22,47.6
College public,delai_dec20,5.3
College public,delai_juin21,6
College public,delai_dec21,14.4
College public,delai_sep22,14.4
University,delai_dec20,11.1
University private,delai_juin21,14.3
University,delai_dec21,14
University,delai_sep22,17.1
,delai_dec20,5.6
,delai_juin21,7.1
,delai_dec21,12
,delai_sep22,18.1
;

The Infile says that the variables a separated by commas and the DSD option means that the value of type is actually missing

for those last 4 rows. The delimiter is the easiest way to manually handle blanks as part of values.

 

Your Proc tabulate code does not run because there is no TABLE statement.

Proc tabulate behavior by default is to exclude records for any class variable that has missing values. So if you know and want to display missing values you have to add the option MISSING to the class statement for that variable. If the order of the values in the data set is the order you want them displayed then the option Order=DATA is a better choice. Getting formatted values to sort for characters is not impossible but may involve a lot of trial and error and some unusual options on the format.

 

Your "date" values won't sort typically. If you want a "date" to sort in date order use a DATE value instead of character values. IF the values of your data are in the correct order you may be able to use the ORDER=DATA option. Maybe. Nesting such in different dimensions gets tricky.

You also don't show using that format.

This gets part of what you want with a working tabulate for the above example data

proc format;
value $ht (default=20)
" " ='Total           ';
run;

proc tabulate data=students;
   var COL1;
   CLASS type_college/missing order=data ;
   format type_college $ht. ;
   CLASS name/order=data ;
   /*row statement */
   Table type_college
          all= 'Column Total',
   /* Column statement */
   name * (COL1 * Sum={LABEL="Somme"});
;
run;

Note: trailing blanks in a format are basically useless as almost every SAS procedure will trim them. I have a default length for the format to show the "University private". If you really didn't want that then you should make sure your code runs and the values are as expected. The format only modifies behavior of missing values.

I used a different label in the ALL so you could tell the "total" from the missing type as different from the All total. I haven't a clue as to why you think "total" is appropriate as a label for the missing types.

 

 

 

View solution in original post

2 REPLIES 2
ballardw
Super User

First thing, please make sure that your data step to create example data creates what you expect and need to demonstrate the issue. When I run your code:

305  data students;
306  input  type_college :$16.  name $   COL1:4.2 ;
307  datalines;

NOTE: Invalid data for COL1 in line 308 19-29.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
308        College private   delai_dec20  1.5
type_college=College name=private COL1=. _ERROR_=1 _N_=1
NOTE: Invalid data for COL1 in line 309 19-30.
309        College private   delai_juin21  3.6
type_college=College name=private COL1=. _ERROR_=1 _N_=2
NOTE: Invalid data for COL1 in line 310 19-29.
310        College private   delai_dec21   4.4
type_college=College name=private COL1=. _ERROR_=1 _N_=3
NOTE: Invalid data for COL1 in line 311 19-29.
311        College private   delai_sep22   47.6
type_college=College name=private COL1=. _ERROR_=1 _N_=4
NOTE: Invalid data for COL1 in line 312 16-26.
312        College public delai_dec20 5.3
type_college=College name=public COL1=. _ERROR_=1 _N_=5
NOTE: Invalid data for COL1 in line 313 16-27.
313        College public delai_juin21 6
type_college=College name=public COL1=. _ERROR_=1 _N_=6
NOTE: Invalid data for COL1 in line 314 16-26.
314        College public delai_dec21 14.4
type_college=College name=public COL1=. _ERROR_=1 _N_=7
NOTE: Invalid data for COL1 in line 315 16-26.
315        College public delai_sep22 14.4
type_college=College name=public COL1=. _ERROR_=1 _N_=8
NOTE: Invalid data for COL1 in line 317 20-31.
317        University private delai_juin21 14.3
type_college=University name=private COL1=. _ERROR_=1 _N_=10
NOTE: Invalid data for COL1 in line 321 14-25.
321                     delai_juin21 7.1
type_college=delai_dec20 name=5.6 COL1=. _ERROR_=1 _N_=13
NOTE: Invalid data for COL1 in line 323 13-23.
323                    delai_sep22  18.1
type_college=delai_dec21 name=12 COL1=. _ERROR_=1 _N_=14
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.STUDENTS has 14 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


324  ;

What is happening is that your data is using LIST input. That means the spaces between the words of the type college are making those as 2 variable: type_college is reading College, Name as Private and then the numeric variable Col1 is attempting to read "delai_dec20" as a number.

Also, again since you are using List input, the college type would not be blank or missing as it would read the 'delai_dec20' as the type .

You also have multiple issues with the attempted informats. It appears that you expect a value of "University private" for the type variable. That length is longer than 16 characters. Name would be read as 8 characters max and your values go up to 12 characters. Lastly SAS will imply decimals when you provide a decimal value as part of the Informat. So using Col1 with a 4.2 informat means that any value of Col1 that does not have a decimal point in the value, i.e. the 12 on the 15th row would be read as value of 0.12. Generally if your numeric values do not have odd characters such as currency symbols or commas there is seldom a need to provide an informat for them and extremely rare to provide a decimal in the informat unless you are reading ancient files that use implied decimals to same space as was done on punch cards.

So this is a data set that addresses the issues:

data students;
  infile datalines dlm=',' dsd;
  informat type_college $19.  name $15. ; 
input  type_college   name $  COL1 ;
datalines;
College private,delai_dec20,1.5
College private,delai_juin21,3.6
College private,delai_dec21,4.4
College private,delai_sep22,47.6
College public,delai_dec20,5.3
College public,delai_juin21,6
College public,delai_dec21,14.4
College public,delai_sep22,14.4
University,delai_dec20,11.1
University private,delai_juin21,14.3
University,delai_dec21,14
University,delai_sep22,17.1
,delai_dec20,5.6
,delai_juin21,7.1
,delai_dec21,12
,delai_sep22,18.1
;

The Infile says that the variables a separated by commas and the DSD option means that the value of type is actually missing

for those last 4 rows. The delimiter is the easiest way to manually handle blanks as part of values.

 

Your Proc tabulate code does not run because there is no TABLE statement.

Proc tabulate behavior by default is to exclude records for any class variable that has missing values. So if you know and want to display missing values you have to add the option MISSING to the class statement for that variable. If the order of the values in the data set is the order you want them displayed then the option Order=DATA is a better choice. Getting formatted values to sort for characters is not impossible but may involve a lot of trial and error and some unusual options on the format.

 

Your "date" values won't sort typically. If you want a "date" to sort in date order use a DATE value instead of character values. IF the values of your data are in the correct order you may be able to use the ORDER=DATA option. Maybe. Nesting such in different dimensions gets tricky.

You also don't show using that format.

This gets part of what you want with a working tabulate for the above example data

proc format;
value $ht (default=20)
" " ='Total           ';
run;

proc tabulate data=students;
   var COL1;
   CLASS type_college/missing order=data ;
   format type_college $ht. ;
   CLASS name/order=data ;
   /*row statement */
   Table type_college
          all= 'Column Total',
   /* Column statement */
   name * (COL1 * Sum={LABEL="Somme"});
;
run;

Note: trailing blanks in a format are basically useless as almost every SAS procedure will trim them. I have a default length for the format to show the "University private". If you really didn't want that then you should make sure your code runs and the values are as expected. The format only modifies behavior of missing values.

I used a different label in the ALL so you could tell the "total" from the missing type as different from the All total. I haven't a clue as to why you think "total" is appropriate as a label for the missing types.

 

 

 

Child79
Fluorite | Level 6

Sorry for errors. I wanted to use a small dummy dataset to represent the database I'm working with.

I did not expect a value of "University private". It should just be "University".

Thanks for correcting me.

This is the good dataset.

data students;
  infile datalines dlm=',' dsd;
  informat type_college $15.  name $15. ; 
input  type_college   name $  COL1 ;
datalines;
College private,delai_dec20,1.5
College private,delai_juin21,3.6
College private,delai_dec21,4.4
College private,delai_sep22,47.6
College public,delai_dec20,5.3
College public,delai_juin21,6
College public,delai_dec21,14.4
College public,delai_sep22,14.4
University,delai_dec20,11.1
University,delai_juin21,14.3
University,delai_dec21,14
University,delai_sep22,17.1
,delai_dec20,5.6
,delai_juin21,7.1
,delai_dec21,12
,delai_sep22,18.1
;

Please can you explain me how to draw a curve of evolution of col1 according to the name for each type_college. On the abscissa axis, I have the name appeared in this order (delai_dec20,delai_juin21,delai_dec21,delai_sep22).

Here is a picture of I want.

Capture d’écran 2022-10-12 000900.png

Thanks in advance.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 286 views
  • 0 likes
  • 2 in conversation