Hi all,
I am trying to do the practice on Dashboard/ My courses/ SAS Programming 1: Essentials/ Lessons/ Lesson 3: Exploring and Validating Data.
The code I wrote is:
proc sort data=PG1.np_largeparks nodupkey out=park_clean dupout=park_dups;
by _all_;
run;
and the code solution says:
proc sort data=pg1.np_largeparks out=park_clean dupout=park_dups nodupkey; by _all_; run;
Thanks,
Cagri
How many records were in PG1.np_largeparks when it was created at the set up of the training data sets?
I might suspect an earlier Proc sort without the OUT= that sorted the data set in place (see the note about the data is already sorted in the log?) and deleted the records already. So there is nothing to remove now.
How many records were in PG1.np_largeparks when it was created at the set up of the training data sets?
I might suspect an earlier Proc sort without the OUT= that sorted the data set in place (see the note about the data is already sorted in the log?) and deleted the records already. So there is nothing to remove now.
Hi:
If you want to restore the data back to the start point of class, all you need to do is rerun the program that makes the data. If you rerun the program (as you did when you initially set up the data), the class files will be refreshed.
As you can see from my LOG, below:
after I make the data for class, you should start with 153 rows in PG1.NP_LARGEPARKS with 30 duplicate rows. So it appears that you've already deleted the dups from the LARGEPARKS data table.
Cynthia
Dear Cynthia, thank you so much.
I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.
@ncd wrote:
Dear Cynthia, thank you so much.
I have rerun the file that makes that data now I have 153 obs in the raw file. Somehow the file after duplicates are deleted was overwritten on the original file when I was working on it or some other glitch occurred. Now I set it up from the beginning and the original file has 153 obs. Thanks for your help.
Proc Sort when you do not use the OUT= option replaces the data set used.
It is quite typical for people to use
Proc sort data=somedataset; by thisvar thatvar; run;
Which sorts in place, i.e. replaces the original set with one sorted.
But if you use
Proc sort data=somedataset nodupkey; by thisvar thatvar; run;
Then it replaces the data set with one sorted and with the duplicates removed.
This is the designed behavior and not a "glitch".
You would not be the first person to unintentionally delete records. Ask me how I know 😳
Hello,
Let my ask another question in this regard. Why there was neither error nor the discrepancy in the output data when I put "nodupkey" prior to dupout=park.dups:?
How can I understand when the commands order is strict and when I can be "creative"?
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 74 proc sort data=pg1.np_largeparks out=park_clean 75 nodupkey dupout=park_dups; 76 by _all_; 77 run; NOTE: There were 153 observations read from the data set PG1.NP_LARGEPARKS. NOTE: 30 observations with duplicate key values were deleted. NOTE: The data set WORK.PARK_CLEAN has 123 observations and 5 variables. NOTE: The data set WORK.PARK_DUPS has 30 observations and 5 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds
Thank you.
Hi:
We recommend that you refer to the documentation to find whether an option for a procedure is required to be specified a certain way. Here are 3 different invocations of PROC SORT. Note that all 3 invocations work, even if the options like DATA=, OUT=, DUPOUT= and NODUPKEY are listed in a different order each time:
Generally, after the keyword PROC you must list the procedure name and then usually other options can be specified in any order. As a best practice, I always use the DATA= option and the OUT= option first, when I code my PROC SORT, but even DATA= is optional because if you don't have it, then SAS uses the value of the automatic variable _LAST_.
Cynthia
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.
Find more tutorials on the SAS Users YouTube channel.