BookmarkSubscribeRSS Feed
hz16g22
Obsidian | Level 7

For one of my questions I was asked to create a seperate dataset with only female names, the question is out of 10, and she never specified what step to use, i used proc sort like this 

33   proc sort data=Names out=work.Female_names (drop=gender);
34       by Name Count;
35       where Gender='F';
36   run;

NOTE: There were 89749 observations read from the data set WORK.NAMES.
      WHERE Gender='F';
NOTE: The data set WORK.FEMALE_NAMES has 89749 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.09 seconds
      cpu time            0.00 seconds

I got 0/10 for this question even though my friend who got 10/10 used a data step and got the same result i did (that being a seperate dataset). Her reasoning was that it is a major issue to not use data step for something like this even though it got the same result, I asked chatgpt if this is a major issue and it said that its not an issue if the code to use was never specified. Before I appeal, I would like to hear from an expert if this truly is a major issue that I used proc sort instead of data.

 

12 REPLIES 12
Quentin
Super User

Definitely worth discussing with your instructor.  Your code is one way to subset the data.  If you need to SORT the data in anticipation of a later step, using PROC SORT with a WHERE statement would be more efficient than using a DATA step followed by a PROC SORT step.  There are lots of ways in SAS to subset data.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
ballardw
Super User

I would have to see the exact instructions provided. As in letter by letter.

 

Proc Sort is a bit of overkill but without knowing the grading considerations it is hard to evaluate. 

I suspect there may have been some semi-automatic grading applied, such as comparing your resulting data set to a "standard" output expected. If that is the case then you may have "failed" because your removed the Gender variable. Depending on the comparison method used the change in order may have been a factor.

 

As far as "Her reasoning was that it is a major issue to not use data step for something like this" goes, without given a very specific reason as to why a separate data set is needed there really is not any reason to create such. For any report or analysis you can subset the data with either a WHERE statement, as you used, or a similar data set option Where clause. Proliferation of data sets is actually often a symptom of poor design.

 

I do understand frustration with what appears to be identical results being downgraded because you use a different method than the grader expected. I have seen people get lower scores on use of programs like Excel because the test taker knew the keystroke short cuts to accomplish stuff instead of using the point-and-click through 5 sub-menus to do the same thing. Or did them in a different order than expected.

 

hz16g22
Obsidian | Level 7

These were her instructions:

Create a new data set (female_names) that only contains those names which were given to female babies. Include only the variables name and count (10)

ballardw
Super User

@hz16g22 wrote:

These were her instructions:

Create a new data set (female_names) that only contains those names which were given to female babies. Include only the variables name and count (10)


With instructions like that I would ask the instructor if Proc SQL would have been acceptable.

If you haven't been taught Proc SQL then we get into the discussion of what the grading is based on: Taught material or results. It used to be that to get A grades in some of classes I took you had to demonstrate going beyond only what was in the lecture/class sessions (learn on your own).

hz16g22
Obsidian | Level 7
She said only the data step is acceptable, yet it was not specified, would my appeal therefore make a good case as my proc sort achieved the same result
Reeza
Super User
Personally, I don't think zero is fair, but I wouldn't give full marks either as given the question it's an inefficient solution.

I'm assuming your class has also been taught data set options versus the WHERE and KEEP/DROP statements.
Tom
Super User Tom
Super User

@hz16g22 wrote:

These were her instructions:

Create a new data set (female_names) that only contains those names which were given to female babies. Include only the variables name and count (10)


There is a lot left out of there.  Perhaps it was explained just before the question?

What is the variable that can be used to indicate if baby is female?  What is the value of that variable that indicates female?

Is COUNT one of the existing variables?

data female_names;
  set have;
  where sex='FEMALE';
  keep name count;
run;
data female_names;
  set have;
  if sex='FEMALE';
  keep name count;
run;
data female_names(keep=name count);
  set have;
  where sex='FEMALE';
run;
proc sql;
  create table female_names as
    select name,count
    from have
    where sex='FEMALE'
  ;
quit;

Or do you also need to count something?

proc freq data=have;
  tables name / out=female_names(keep=name count);
  where sex='FEMALE';
run;
Reeza
Super User
Was this proc sort methodology taught in class? Usually the answers should reflect what you've been taught otherwise it's clear that you're not using the class material and likely using ChatGPT or a forum for answers.

Why is data step better?
Less processing of the data.
Sorting uses additional resources making this an inefficient step, so using a data step is the more efficient answer. Depending on the next steps of what is required in the processing, a data step can also accomplish other calculations making it more efficient.

If you have to sort the data for the next step, then filtering in the proc sort is the right decision.

hz16g22
Obsidian | Level 7
it was yes, we were taught mostly proc procedures, i didnt use chatgpt as it would likely use more advanced code
Reeza
Super User
Majority of the code online is beginner level.
ballardw
Super User

@hz16g22 wrote:
it was yes, we were taught mostly proc procedures, i didnt use chatgpt as it would likely use more advanced problematic code

We have a thread where some one had used Chatgpt to generate code. It had so many errors it was hard to even figure out what the code (questioner) thought it was attempting.

mkeintz
PROC Star

Your program risked changing the original order of names.

 

This effectively presumes that either the original order is unimportant, or that there would be a way to reproduce the original order if needed.

 

The fact that it matched your co-student's results was by chance, since the original data happened to be in name/count alphabetic order.

 

I would not give full credit for this answer to the lecturer's question.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1265 views
  • 10 likes
  • 6 in conversation