Current student studying programming and getting very stuck on 1 part of my SAS assignment, any help would be greatly appreciated!
This is my current code which includes the data:
DATA college09ds;
	INFILE DATALINES FIRSTOBS=2;
	INPUT GenderType $ Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. @@;
	IF FINDW(UPCASE(GenderType), 'F') > 0 THEN
		DO;
			Gender='F';
			Total_F=Total;
		END;
	ELSE
		DO;
			Gender='M';
			Total_M=Total;
		END;
	IF FINDW(UPCASE(GenderType), 'IHE') > 0 THEN
		Institution='IHE';
	ELSE
		Institution='DGI';
	RETAIN Total_M Total_F;
	Total_MF=Total_M + Total_F;
	Grand_F + Total_F;
	Grand_M + Total_M;
	Grand_MF + Total_MF;
	IF Gender='F';
	DATALINES;
Variables: GenderType Year Total Asofdate
ihe-M 1970 5,044 Retrieved: 03/14/2012 f-IHE 1970  3,537 Retrieved: 03/14/2012
ihe-m 1980 5,874 Retrieved: 03/14/2012 f-IHE 1980  6,223 Retrieved: 03/14/2012
ihe-M 1990 6,284 Retrieved: 03/14/2012 F-IHE 1990  7,535 Retrieved: 03/14/2012
dgi-M 2000 6,722 Retrieved: 03/14/2012 f-dgi 2000  8,591 Retrieved: 03/14/2012
dgi-m 2001 6,961 Retrieved: 03/14/2012 f-dgi 2001  8,967 Retrieved: 03/14/2012
DGI-m 2002 7,202 Retrieved: 03/14/2012 f-dgi 2002  9,410 Retrieved: 03/14/2012
dgi-M 2003 7,260 Retrieved: 03/14/2012 F-DGI 2003  9,651 Retrieved: 03/14/2012
dgi-M 2004 7,387 Retrieved: 03/14/2012 F-DGI 2004  9,885 Retrieved: 03/14/2012
dgi-m 2005 7,456 Retrieved: 03/14/2012 f-DGI 2005 10,032 Retrieved: 03/14/2012
DGI-m 2006 7,575 Retrieved: 03/14/2012 f-dgi 2006 10,184 Retrieved: 03/14/2012
dgi-M 2007 7,816 Retrieved: 03/14/2012 f-dgi 2007 10,432 Retrieved: 03/14/2012
dgi-M 2008 8,189 Retrieved: 03/14/2012 F-DGI 2008 10,914 Retrieved: 03/14/2012
DGI-m 2009 8,770 Retrieved: 03/14/2012 F-dgi 2009 11,658 Retrieved: 03/14/2012
;
RUN;
PROC FORMAT;
	VALUE $inst 'IHE'='Institutions of Higher Education' 
		'DGI'='Degree-Granting Institutions';
	VALUE $sex 'M'='Male/Female' 'F'='Male/Female';
PROC PRINT DATA=college09ds;
	ID Institution;
	BY DESCENDING Institution;
	SUM Total_MF Total_M Total_F;
	OPTIONS NODATE NUMBER PAGENO=1 PAGESIZE=33 LINESIZE=256 ORIENTATION=LANDSCAPE;
	FORMAT Institution $inst. Gender $sex. Asofdate DATE9. Total_MF 
		COMMA7. Total_M COMMA7. Total_F COMMA7. Grand_MF COMMA7. Grand_M 
		COMMA7. Grand_F COMMA7.;
	VAR Gender Year Asofdate Total_MF Total_M Total_F Grand_MF Grand_M Grand_F;
	TITLE1 'Enrollment Statistics for Years 1970 through 2009';
	FOOTNOTE1 ' statistics are courtesy of National Center for Education Statistics ';
RUN;All parts are currently printing correctly except for the Grand_MF, Grand_M, and Grand_F columns, the values are very off.
When I step through this as I would in another programming language I am not finding my logic error, but I know it must be there.
The assignment:
Include SAS statements in your program to perform the tasks listed below for each line of raw data.
1. Within the DATA step, assign the value in the Total variable to a variable named Total_F when the value in the Gender variable is F. Otherwise, assign the value in the Total variable to a variable named Total_M when the value in the Gender variable is M.
2. Within the DATA step, add the values in the Total_F variable to the values in the Total_M variable and store the result in a variable named Total_MF.
3. Within the DATA step, compute running totals for the values in the Total_F, Total_M, and Total_MF variables and store the running totals in variables named Grand_F, Grand_M, and Grand_MF, respectively.
4. Within the DATA step, only include observations in the college09ds data set when the value in the Gender variable is F.
5. Without using IF-THEN/ELSE logic in the DATA step, construct user friendly descriptions to display for the values in the Institution variable. Display Institutions of Higher Education when the value in the Institution variable is IHE. Otherwise, display Degree-Granting Institutions when the value in the Institution variable is DGI.
Note: Do not overwrite the values in the Institution variable in the data set.
6. Without using IF-THEN/ELSE logic in the DATA step, construct user friendly descriptions to display for the values in the Gender variable. Display Male/Female when the value in the Gender variable is F or M.
Note: Do not overwrite the values in the Gender variable in the data set.
Display the contents of the SAS data set. Include the appropriate statements in your SAS program to display the contents of your data set just like the Sample Program Output
Sample Program Output:
@Astounding has already given you the hints for your current code.
As you've done already such a lot of the work yourself I don't think it hurts to also post an alternative coding approach for the data step.
DATA college09ds;
	INFILE DATALINES FIRSTOBS=2 truncover;
	INPUT GenderType $ Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. @;
  /* req 1 */
  Total_M=Total;
  
  /* req 4: For the source data Gender 'F' is always the 2nd tuple of data in the record */
	INPUT GenderType :$Upcase4. Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. ;
  /* req 1 */
  Total_F=Total;
  /* req 2 */
  Total_MF=sum(Total_M, Total_F);
  /* req 3 */
  Grand_F + Total_F;
  Grand_M + Total_M;
  Grand_MF + Total_MF;
  /* req 5 - together with format */
  length Institution $3;
  Institution=scan(GenderType,2,'-');
  /* req 6 - together with format */
  Gender='F';
	DATALINES;
Variables: GenderType Year Total Asofdate
ihe-M 1970 5,044 Retrieved: 03/14/2012 f-IHE 1970  3,537 Retrieved: 03/14/2012
ihe-m 1980 5,874 Retrieved: 03/14/2012 f-IHE 1980  6,223 Retrieved: 03/14/2012
ihe-M 1990 6,284 Retrieved: 03/14/2012 F-IHE 1990  7,535 Retrieved: 03/14/2012
dgi-M 2000 6,722 Retrieved: 03/14/2012 f-dgi 2000  8,591 Retrieved: 03/14/2012
dgi-m 2001 6,961 Retrieved: 03/14/2012 f-dgi 2001  8,967 Retrieved: 03/14/2012
DGI-m 2002 7,202 Retrieved: 03/14/2012 f-dgi 2002  9,410 Retrieved: 03/14/2012
dgi-M 2003 7,260 Retrieved: 03/14/2012 F-DGI 2003  9,651 Retrieved: 03/14/2012
dgi-M 2004 7,387 Retrieved: 03/14/2012 F-DGI 2004  9,885 Retrieved: 03/14/2012
dgi-m 2005 7,456 Retrieved: 03/14/2012 f-DGI 2005 10,032 Retrieved: 03/14/2012
DGI-m 2006 7,575 Retrieved: 03/14/2012 f-dgi 2006 10,184 Retrieved: 03/14/2012
dgi-M 2007 7,816 Retrieved: 03/14/2012 f-dgi 2007 10,432 Retrieved: 03/14/2012
dgi-M 2008 8,189 Retrieved: 03/14/2012 F-DGI 2008 10,914 Retrieved: 03/14/2012
DGI-m 2009 8,770 Retrieved: 03/14/2012 F-dgi 2009 11,658 Retrieved: 03/14/2012
;
RUN;
PROC FORMAT;
  VALUE $inst 
    'IHE'='Institutions of Higher Education' 
    'DGI'='Degree-Granting Institutions'
    ;
  VALUE $sex 
    'M'='Male/Female' 
    'F'='Male/Female'
    ;
run;
OPTIONS NODATE NUMBER PAGENO=1 PAGESIZE=33 LINESIZE=256 ORIENTATION=LANDSCAPE;
PROC PRINT DATA=college09ds;
  ID Institution;
  BY DESCENDING Institution;
  SUM Total_MF Total_M Total_F;
  FORMAT Institution $inst. Gender $sex. Asofdate DATE9. Total_MF 
    COMMA7. Total_M COMMA7. Total_F COMMA7. Grand_MF COMMA7. Grand_M 
    COMMA7. Grand_F COMMA7.;
  VAR Gender Year Asofdate Total_MF Total_M Total_F Grand_MF Grand_M Grand_F;
  TITLE1 'Enrollment Statistics for Years 1970 through 2009';
  FOOTNOTE1 ' statistics are courtesy of National Center for Education Statistics ';
RUN;
I do hope your next exercise will have requirements which allow for a smarter way of doing things - like allowing to read the source record into two SAS observations and using a SAS Procedure like Proc Report for at least some of the totals.
Sometimes, clearing up early errors reveals additional errors. With that in mind, here are two places to begin.
You would expect that gender is either F or M on any given observation. Therefore, either TOTAL_F or TOTAL_M should be missing on every observation.
That means this statement should be fixed:
Total_MF = Total_M + Total_F;
If one of these is missing, the total would be missing. So you need to change this to:
Total_MF = sum(Total_M, Total_F);
The SUM function ignores missing values.
Why didn't this problem surface earlier? Because of the RETAIN statement. There is no reason to retain TOTAL_M and TOTAL_F. In fact, that's what is making your grand totals come out so high. By retaining them, you get one of these results:
So get rid of RETAIN, fix the TOTAL_MF computation, and see how close you are to the right result.
While trying to track down this error I did try to fix it with your suggested way, which does in fact fix the running total error, however then another error pops up with the Total_M variable not printing at all?!?! Doesn't print but the values in the Total_M variable must be there because the values for the running totals are now correct. Tracking down these errors has taken more than a few hours and I am beginning to go a little cross-eyed trying to find the error / errors that I have in my code. I am very new to SAS programming and not all of the logic seems intuitive to me yet. I think there may be an issue with the "IF Gender = 'F';" statement that is intended to delete the 'M' observations but with out it the the 'M' observations are printed in the results. Should it be in a different place or is a different statement required?
Thank you so much for your response, your help is very appreciated!
I don't consider an input with @@ as beginner level. And that's what tricks you here.
SAS variables get set to missing at the beginning of an iteration of the data step. With the @@ you get actually two iterations through the data step per source record. So in the second iteration Total_M will get set to missing without getting populated anymore (as it's the Gender='F' round).
The running totals remain correct as there the RETAIN statement inhibits SAS to set the variables to missing.
If you run the following syntax you can see what happens.
DATA college09ds;
	INFILE DATALINES FIRSTOBS=2;
	INPUT GenderType $ Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. @@;
	IF FINDW(UPCASE(GenderType), 'F') > 0 THEN
		DO;
			Gender='F';
			Total_F=Total;
		END;
	ELSE
		DO;
			Gender='M';
			Total_M=Total;
		END;
   put _n_= Gender= total_F= total_M=;
	IF Gender='F';
	DATALINES;
......SAS Log - note that the automatic variable _N_ sums up to 2 times the number of your source records.
_N_=1 Gender=M Total_F=. Total_M=5044 _N_=2 Gender=F Total_F=3537 Total_M=. _N_=3 Gender=M Total_F=. Total_M=5874 _N_=4 Gender=F Total_F=6223 Total_M=.
@Astounding has already given you the hints for your current code.
As you've done already such a lot of the work yourself I don't think it hurts to also post an alternative coding approach for the data step.
DATA college09ds;
	INFILE DATALINES FIRSTOBS=2 truncover;
	INPUT GenderType $ Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. @;
  /* req 1 */
  Total_M=Total;
  
  /* req 4: For the source data Gender 'F' is always the 2nd tuple of data in the record */
	INPUT GenderType :$Upcase4. Year Total COMMA6. Retrieved $ Asofdate MMDDYY10. ;
  /* req 1 */
  Total_F=Total;
  /* req 2 */
  Total_MF=sum(Total_M, Total_F);
  /* req 3 */
  Grand_F + Total_F;
  Grand_M + Total_M;
  Grand_MF + Total_MF;
  /* req 5 - together with format */
  length Institution $3;
  Institution=scan(GenderType,2,'-');
  /* req 6 - together with format */
  Gender='F';
	DATALINES;
Variables: GenderType Year Total Asofdate
ihe-M 1970 5,044 Retrieved: 03/14/2012 f-IHE 1970  3,537 Retrieved: 03/14/2012
ihe-m 1980 5,874 Retrieved: 03/14/2012 f-IHE 1980  6,223 Retrieved: 03/14/2012
ihe-M 1990 6,284 Retrieved: 03/14/2012 F-IHE 1990  7,535 Retrieved: 03/14/2012
dgi-M 2000 6,722 Retrieved: 03/14/2012 f-dgi 2000  8,591 Retrieved: 03/14/2012
dgi-m 2001 6,961 Retrieved: 03/14/2012 f-dgi 2001  8,967 Retrieved: 03/14/2012
DGI-m 2002 7,202 Retrieved: 03/14/2012 f-dgi 2002  9,410 Retrieved: 03/14/2012
dgi-M 2003 7,260 Retrieved: 03/14/2012 F-DGI 2003  9,651 Retrieved: 03/14/2012
dgi-M 2004 7,387 Retrieved: 03/14/2012 F-DGI 2004  9,885 Retrieved: 03/14/2012
dgi-m 2005 7,456 Retrieved: 03/14/2012 f-DGI 2005 10,032 Retrieved: 03/14/2012
DGI-m 2006 7,575 Retrieved: 03/14/2012 f-dgi 2006 10,184 Retrieved: 03/14/2012
dgi-M 2007 7,816 Retrieved: 03/14/2012 f-dgi 2007 10,432 Retrieved: 03/14/2012
dgi-M 2008 8,189 Retrieved: 03/14/2012 F-DGI 2008 10,914 Retrieved: 03/14/2012
DGI-m 2009 8,770 Retrieved: 03/14/2012 F-dgi 2009 11,658 Retrieved: 03/14/2012
;
RUN;
PROC FORMAT;
  VALUE $inst 
    'IHE'='Institutions of Higher Education' 
    'DGI'='Degree-Granting Institutions'
    ;
  VALUE $sex 
    'M'='Male/Female' 
    'F'='Male/Female'
    ;
run;
OPTIONS NODATE NUMBER PAGENO=1 PAGESIZE=33 LINESIZE=256 ORIENTATION=LANDSCAPE;
PROC PRINT DATA=college09ds;
  ID Institution;
  BY DESCENDING Institution;
  SUM Total_MF Total_M Total_F;
  FORMAT Institution $inst. Gender $sex. Asofdate DATE9. Total_MF 
    COMMA7. Total_M COMMA7. Total_F COMMA7. Grand_MF COMMA7. Grand_M 
    COMMA7. Grand_F COMMA7.;
  VAR Gender Year Asofdate Total_MF Total_M Total_F Grand_MF Grand_M Grand_F;
  TITLE1 'Enrollment Statistics for Years 1970 through 2009';
  FOOTNOTE1 ' statistics are courtesy of National Center for Education Statistics ';
RUN;
I do hope your next exercise will have requirements which allow for a smarter way of doing things - like allowing to read the source record into two SAS observations and using a SAS Procedure like Proc Report for at least some of the totals.
@Patrick Thank you for both of your replies, extremely helpful! Your code in the first reply works perfectly and I understand some/most of what is happening but some is beyond my understanding of SAS.
Having 2 different INPUT statements? I take it that is to avoid using @@ ? The TRUNCOVER alternates between the two INPUT statements?
In the 1st INPUT you say "GenderType $" and the 2nd "GenderType :$Upcase4.", I see bringing in the variable in uppercase by why for only one INPUT statement? That loses me.
And, "Gender = 'F';", is this for all iterations of Gender or every other?
You are obviously very versed in your SAS programming! I am coming in at it as a first semester SAS student trying to use IF/THEN/ELSE logic to figure these things out. Not that I have to use if/then/else I mean sometimes the SAS switch statement is great, but what I mean is you are showing me the Best way to do this, and it is great, but I'm sure if I used your code my instructor would have some questions that I would be unable to answer. I know you aren't here to teach every SAS noob how to do their homework, so I really do appreciate all of the help you have provided!
Thank You!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
