BookmarkSubscribeRSS Feed
GregG
Quartz | Level 8

This is a hard question to ask.  I am trying to put together a report that breaks down all full time graduate students (for a specific program) by ethnicity.  I want a subset of that for those are are in their first semester in that program.

I have an if statement that creates a field and inserts a 1 if they are full time (everyone in this table gets that 1) and creates another field and inserts a 1 if the current semester matches their start semester.

I guess I'll try to clarify with my code.

This code:

    proc report data=gss_fulltime headline box nowd;

  

    column Ethnic Status,(Gender Total);

    define Ethnic / order=internal format=$ethnic_gss. group ;

    define Gender / across order=internal format=$gender_gss.;

    define Total / computed style={background=lightsteelblue foreground=maroon font_weight=bold};

    define Status / format=$status_gss. across order=internal "Full Time Graduate Students";

    compute Total;

        _C4_ = sum(_C2_,_C3_);

        _C7_ = sum(_C5_,_C6_);

    endcomp;

    rbreak after / summarize;

    run;

produces this output:

http://www.freeimagehosting.net/l4yzp

This is an early version where I only created one new field and gave it a 1 (FT) or a 2 (FTIP) - before realizing that the first set of numbers should include the second (for a total of all FT grad students).

This is how the first four columns should look (I had to make it not a link, sorry):

http://www.freeimagehosting.net/d9cgt

So I go back to my code and create two new fields, FT and FTIP.  Everyone gets a 1 in the FT, only those whose Term = Start_Term get a 1 in the FTIP field.  I also created a gender3=gender.

Here is the code for the report for this new setup:

    proc report data=gss_fulltime headline box nowd completerows;

  

    column Ethnic FT,(Gender  Total) FTIP,(Gender3  Total2);

    define Ethnic / group order=internal format=$ethnic_gss. preloadfmt;

    define FT      / "Full Time" across order=internal;

    define Gender / across order=internal format=$gender_gss.;

    define Total  /  computed style={background=lightsteelblue foreground=maroon font_weight=bold};

    define FTIP      / "First Time" across order=internal;

    define Gender3 / across order=internal format=$gender_gss.;

    define Total2  /  computed style={background=lightsteelblue foreground=maroon font_weight=bold};

    compute Total;

        _C4_ = sum(_C2_,_C3_);

    endcomp;

    compute Total2;

        _C7_ = sum(_C5_,_C6_);

    endcomp;

    rbreak after / summarize style={background=lightsteelblue foreground=maroon font_weight=bold};

    run;

But when I run this report, it ONLY populates both "sets" (FT and FTIP) with the numbers from the FTIP field:

Here is what it looks like:

http://www.freeimagehosting.net/63ove

I realize this is an insanely long question, any ideas?

This is what I want it to reflect (the numbers, not necessarily the formatting):  http://www.freeimagehosting.net/qrorz

11 REPLIES 11
Cynthia_sas
SAS Super FREQ

Hi:

  I see what you want, but now the structure of your input data is going to be really really critical to understanding what you get. Since you did not post data, I created the report and duplicated your final desired structure. So, without knowing what your data looks like, if you can understand my structure (shown in a DATA step program) and then get your head around the PROC REPORT, the key to making this happen is having 2 across variables and getting them nested correctly. Then, because you do have 2 across variables, you were ALMOST there with TOTAL and TOTAL2, but you don't need to do it that way. You have 1 name in your COLUMN statement and you just put both assignment statements in the compute block for that single name.

cynthia

data new;
length ethnic $30 status $8 gender $6;
  infile datalines dlm=',' dsd;
  input name $ gender $ ethnic $ height weight status $;
return;
datalines;
"Alfred","Male",D-Asian,69.0,112.5,"Teenager"
"Alice","Female",B-Hispanic,56.5,84.0,"Teenager"
"Barbara","Female",B-Hispanic,65.3,98.0,"Teenager"
"Carol","Female",D-Asian,62.8,102.5,"Teenager"
"Henry","Male",D-Asian,63.5,102.5,"Teenager"
"James","Male",A-International,57.3,83.0,"Pre-teen"
"Jane","Female",A-International,59.8,84.5,"Pre-teen"
"Janet","Female",E-Black/African-American,62.5,112.5,"Teenager"
"Jeffrey","Male",B-Hispanic,62.5,84.0,"Teenager"
"John","Male",A-International,59.0,99.5,"Pre-teen"
"Joyce","Female",B-Hispanic,51.3,50.5,"Pre-teen"
"Judy","Female",D-Asian,64.3,90.0,"Teenager"
"Louise","Female",G-White,56.3,77.0,"Pre-teen"
"Mary","Female",E-Black/African-American,66.5,112.0,"Teenager"
"Philip","Male",G-White,72.0,150.0,"Teenager"
"Robert","Male",A-International,64.8,128.0,"Pre-teen"
"Ronald","Male",E-Black/African-American,67.0,133.0,"Teenager"
"Thomas","Male",A-International,57.5,85.0,"Pre-teen"
"William","Male",E-Black/African-American,66.5,112.0,"Teenager"
"Peter","Male",G-White,72.0,150.0,"Pre-teen"
"Alex","Male",B-Hispanic,64.8,128.0,"Pre-teen"
"Jerry","Male",E-Black/African-American,67.0,133.0,"Pre-teen"
"Sean","Male",B-Hispanic,57.5,85.0,"Pre-teen"
"Larry","Male",E-Black/African-American,66.5,112.0,"Pre-teen"
"Vaughn","Male",E-Black/African-American,67.0,133.0,"Teenager"
"Henry","Male",A-International,59.0,99.5,"Teenager"
"Alicia","Female",D-Asian,64.3,90.0,"Pre-teen"
;
run;
  
ods listing close;
ods html;
proc report data=new nowd;
  column ethnic (status,(gender tot));
  define ethnic / group;
  define status / across;
  define gender/across;
  define tot/computed 'Total'
    style(header)={foreground=darkred}
    style(column)=Header{foreground=darkred};
  compute tot;
    _c4_ = sum(_c2_,_c3_);
    _c7_ = sum(_c4_,_c5_);
  endcomp;
  rbreak after / summarize
    style(summary)=Header{foreground=darkred};
run;
ods html close;


two_across.png
GregG
Quartz | Level 8

Thank you so much Cynthia for your time and help.  I am sorry I did not include data - I am creating a table from proc sql.

I have follow up questions:

I count a total of 27 'datalines' but it is totaling up as 13 pre-teens and 19 teenagers.

What I ultimately need to do is have one "set of columns" (_C2_ _C3_ _C4_Male Female Total) that has ALL observations accounted for (broken down by Ethnicity first and then Gender)  and then a second "set of columns" (_C5_ _C6_ _C7_Male Female Total) that only has a subset of the original set of observations.

Changing your example from "pre-teen" to "13" - I would want the data to represent ALL 27 observations in the first set of columns, and then represent only those who are "13" in the second set of columns.

I initially had two states in the same status column, as your example shows.  I then just created a second status column (status2) that only had data for the subset which I was seeking.  However, at this point, the output would ONLY reflect the subset information.

I will endeavor to create a new code submission, but it may not be until early next week.

Thank you again for your time and help.

Cynthia_sas
SAS Super FREQ

Hi:

  And, sorry, my bad. This is why I should never write code and post it without doing a reality/output check. The COMPUTE block was incorrect. See the corrected code. The amounts now add up to 27. So sorry. I posted a new screenshot, too.

*** produces corrected screen shot ***;

** replace the old compute block completely;

  compute tot;

    _c4_ = sum(_c2_,_c3_);

    _c7_ = sum(_c5_,_c6_);

  endcomp;

  I guess I do NOT understand what you want. Seeing your data may help, but in the final screenshot you posted, it looked like you were nesting two deep across the columns. In the corrected code (substitute this compute block for the original (wrong) compute block) and you should see all 27 accounted for.

  I have posted previously about how to figure out what the absolute column number is going to be, so you can use it in a  COMPUTE block. I'm sort of wondering whether TABULATE would be better. See second screen shot. There are actually 4 tables produced by the TABULATE code, I just showed the first one. You'll have to run the code using WORK.NEW to see all 4 tables.

  To report on only the people who are 13, I would guess that you do need to create a separate variable. Or, maybe, for TABULATE, create a user-defined format, but, a separate variable would be good.

  Take a look at TABULATE output and see what you think. Then, also take a look at the second screen shot, also produced with PROC REPORT. My original data did not have age values. But if the data -did- have ages, I could make a variable whose value was 0 or 1, based on whether age=13. Does the second screen shot look closer to what you envision, assuming that AGE was a variable in the data?

cynthia

** tabulate code;

ods html file='c:\temp\trytab.html';

proc tabulate data=new f=comma6.;
  class ethnic gender status / style={vjust=b};
  table ethnic all,
        status*(gender all='SubTotal')
        status='Status Only' gender='Gender Only'
        all='Overall' /box='Table1' ;
  table ethnic all,
        status*(gender all='SubTotal') all='Total Count'
        /box='Table2';
  table ethnic all,
        status all='Total Count'/box='Table 3';
  table ethnic all,
        gender all='Total Count' / box='Table 4';
  keylabel n=' '
           all='Total';
  keyword all / style={vjust=b};
run;
ods html close;


with_age_13.pngtab_examp.pngtwo_across_corrected.png
GregG
Quartz | Level 8

Cynthia,

Thank you again.  I ended up doing it with Proc Tabulate - and with some finessing, I got it to do what I wanted.  Thank you again.  I am pretty much brand new to SAS, so there are a lot of challenges for me.

GregG
Quartz | Level 8

Okay, I'm back with an actual dataset, and a brand new question (the original question remains, I think).

To reset a bit: I am new to SAS (been exposed to it for < 1 month), and I'm trying to get Proc Report to print two different fields in two different columns.

Additional question: How can I get Proc Report to display missing / null values as . or - instead of 0 when using an across?

On with the show:

The first report is using Proc Report, but missing / null values are showing 0 and I would rather they show - (dash) instead.  I have switched this also to Proc Tabulate in order to be consistent in my displaying of information.

Untitled1.png

The second "report" is how I want it displayed, but I took your advice and ended up using Proc Tabulate instead.  So perhaps you really did answer my question above with that suggestion, but I am still curious if such can be achieved with Proc Report.

Untitled2.png

Again, Cynthia - I thank you greatly for your help so far!

ETA: Also, is there a way to remove the SUM "row" from the Proc Tabulate output?

Message was edited by: Greg Gengo

Cynthia_sas
SAS Super FREQ


Hi:

  I am in class this week, so I have limited time to do a lengthy response. So this will be quicker than usual:

options missing = '-';

options missing = .;

options missing=0; or

options missing = '0'; are you sure you don't have missing set to 0????

are all ways to impact the missing values. Normally, if you have no data for a crossing in an ACROSS situation, the missing cell does show as a . -- for example, if you do a report on SASHELP.CLASS you should not see any Female in Age 16. Which is why I wondered whether some option is set otherwise for your SAS session.

to get rid of SUM in TABULATE -- you need to "blank" out the Statistic either in the KEYLABEL statement:

keylabel sum=' ';

or in the TABLE statement:

table thing1 all,

        thing2*sum=' '*f=comma6.;

I will try to look at your code later, but meanwhile, thanks for posting it so other folks could give it a try.

cynthia

GregG
Quartz | Level 8

Thank you so much for ANY response - as I've mentioned, I'm pretty new.  I've been reading what I can - but since my questions seem to be so specific, I haven't found any documents with the exact answer for which I was looking.

As for the missing="-" , I have updated my code (here and in the above post) and it does the same thing.

I did do what you suggested with regard to the sashelp.class and confirmed that missing is displayed as I want it - so I'm not sure what is going on with that.

Again, I thank you for your help and patience with me.

GregG
Quartz | Level 8

So evidently, the presence of completerows or completecols turns all missing / null values to zeroes - and I can't find a way around it.

My understanding is that completerows and completecols are necessary when using preloadfmt - which I use because not every report will have every ethnicity / gender present, but I need each report to be uniform.  So I created a format, and I use that.

Is there a way on the define line to change missing back to "-" (to 'override' the completerows/cols setting)?

Cynthia_sas
SAS Super FREQ


Hi:

  Not on the DEFINE statement. You can probably do it in a COMPUTE block. I missed the PRELOADFMT on your CLASS statement. Are you sure you don't want to stick with TABULATE? What is the reasong for switching from TABULATE to PROC REPORT?

cynthia

GregG
Quartz | Level 8

I've already switched it to Tabulate for what I was doing.  Now I'm just trying to explore Proc Report some more.

Again, I am new at SAS, but so far, I find Proc Report to be easier than Proc Tabulate as far as formatting the output goes.

The two issues I am having with Proc Report is getting the second set of columns correct (see the second picture from the post earlier

Cynthia_sas
SAS Super FREQ

Hi:

  I think you would either need a user-defined format for the numeric column or you would have to use a COMPUTE block if you wanted to see a . (missing) in PROC REPORT with ACROSS and COMPLETEROWS. For example, there are no FEMALES at AGE=16.

Whichever method you use, the format method is probably easier if you have no other formats in play for the variable that is under the across variable. CALL DEFINE really requires that you know how many across variables there are because you have to use the absolute column name in the statement.

cynthia

proc format;

  value msfmt 0='.';

run;

 

ods listing close;

ods html file='c:\temp\miss_fmt_across.html';

proc report data=sashelp.class nowd completerows;

  title '1) Showing Default';

  column age sex,height;

  define age / group;

  define sex / across;

  define height / n f=2.0;

run;

 

proc report data=sashelp.class nowd completerows;

  title '2) Using format';

  column age sex,height;

  define age / group;

  define sex / across;

  define height / n f=msfmt.;

run;

 

proc report data=sashelp.class nowd completerows;

  title '3) Using CALL DEFINE';

  column age sex,height;

  define age / group;

  define sex / across;

  define height / n f=2.;

  compute height;

    if _c2_ = 0 then _c2_ = .;

    if _c3_ = 0 then _c3_ = .;

  endcomp;

run;

ods html close;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1476 views
  • 6 likes
  • 2 in conversation