BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ncd
Fluorite | Level 6 ncd
Fluorite | Level 6

Hi all,

 

I am trying to do the practice on Dashboard/ My courses/ SAS Programming 1: Essentials/ Lessons/ Lesson 3: Exploring and Validating Data.

 

The code I wrote is:

proc sort data=PG1.np_largeparks nodupkey out=park_clean dupout=park_dups;
by _all_;
run;

and the code solution says:

 

proc sort data=pg1.np_largeparks
		  out=park_clean
		  dupout=park_dups
		  nodupkey;
    by _all_;
run;
Unfortunately, neither of them works. I pasted the log below. Cant figure why there appears 0 observations. The solution says there must be 30 duplicates.
Screen Shot 2021-01-29 at 00.14.53.png

Thanks,

Cagri

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

How many records were in PG1.np_largeparks when it was created at the set up of the training data sets?

 

I might suspect an earlier Proc sort without the OUT= that sorted the data set in place (see the note about the data is already sorted in the log?) and deleted the records already. So there is nothing to remove now.

View solution in original post

8 REPLIES 8
ballardw
Super User

How many records were in PG1.np_largeparks when it was created at the set up of the training data sets?

 

I might suspect an earlier Proc sort without the OUT= that sorted the data set in place (see the note about the data is already sorted in the log?) and deleted the records already. So there is nothing to remove now.

ncd
Fluorite | Level 6 ncd
Fluorite | Level 6
Interestingly enough, there are 123 obs from the beginning. Somehow the file after duplicates are deleted was overwritten on the original file. Now I set it up from the beginning and the original file has 153 obs. Thanks for the quick reply.
Cynthia_sas
SAS Super FREQ

Hi:
If you want to restore the data back to the start point of class, all you need to do is rerun the program that makes the data. If you rerun the program (as you did when you initially set up the data), the class files will be refreshed.
As you can see from my LOG, below:

Cynthia_sas_0-1611872907124.png


after I make the data for class, you should start with 153 rows in PG1.NP_LARGEPARKS with 30 duplicate rows. So it appears that you've already deleted the dups from the LARGEPARKS data table.
Cynthia

ncd
Fluorite | Level 6 ncd
Fluorite | Level 6

Dear Cynthia, thank you so much.

 

I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.

ballardw
Super User

@ncd wrote:

Dear Cynthia, thank you so much.

 

I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.


Proc Sort when you do not use the OUT= option replaces the data set used.

It is quite typical for people to use

 

 

Proc sort data=somedataset;
   by thisvar thatvar;
run;

Which sorts in place, i.e. replaces the original set with one sorted.

 

But if you use

 

Proc sort data=somedataset nodupkey;
   by thisvar thatvar;
run;

Then it replaces the data set with one sorted and with the duplicates removed.

This is the designed behavior and not a "glitch".

 

You would not be the first person to unintentionally delete records. Ask me how I know 😳

 

SASRB
Obsidian | Level 7

Hello,

 

Let my ask another question in this regard. Why there was neither error nor the discrepancy in the output data when I put "nodupkey" prior to dupout=park.dups:?

How can I understand when the commands order is strict and when I can be "creative"?

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         
 74         proc sort data=pg1.np_largeparks out=park_clean
 75         nodupkey dupout=park_dups;
 76         by _all_;
 77         run;
 
 NOTE: There were 153 observations read from the data set PG1.NP_LARGEPARKS.
 NOTE: 30 observations with duplicate key values were deleted.
 NOTE: The data set WORK.PARK_CLEAN has 123 observations and 5 variables.
 NOTE: The data set WORK.PARK_DUPS has 30 observations and 5 variables.
 NOTE: PROCEDURE SORT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds

Thank you.

 
 

 

 

Cynthia_sas
SAS Super FREQ

Hi:

  We recommend that you refer to the documentation to find whether an option for a procedure is required to be specified a certain way. Here are 3 different invocations of PROC SORT. Note that all 3 invocations work, even if the options like DATA=, OUT=, DUPOUT= and NODUPKEY are listed in a different order each time:

Cynthia_sas_0-1690908449508.png

  Generally, after the keyword PROC you must list the procedure name and then usually other options can be specified in any order. As a best practice, I always use the DATA= option and the OUT= option first, when I code my PROC SORT, but even DATA= is optional because if you don't have it, then SAS uses the value of the automatic variable _LAST_.

 

Cynthia

SASRB
Obsidian | Level 7
Thank you for your clarification.
It's good to know about the possibility to get the same outcome in a slightly different ways.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 3514 views
  • 8 likes
  • 4 in conversation