BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mlensing
Obsidian | Level 7

Hi everyone, 

As I'm still a bit new to SAS, I wanted to reach out for guidance surrounding sorting my dataset. I have created a master dataset composed of 3 datasets in which 2 have filled values for SSN (character variable) while the last does not have any values for SSN. I want to sort my master dataset by SSN and so that the missing values appear last/at the end of the dataset. Is this possible and is there a straightforward way to do this? Thank you in advance!

 

DATA HypTabs.Contact;
LENGTHSSN $11.
Inits $3.
City $20.
StateCd $2.
ZipCd $5.;
SETWORK.Contact_IA
WORK.Contact_MS
WORK.Contact_UT;
LABELSSN= 'Social Security Number'
Inits= 'Subject Initials'
City= 'City'
StateCd= 'State Code'
ZipCd= 'Zip Code';
RUN;
 
PROC SORT DATA = HypTabs.Contact;
BY SSN;
RUN;
 
(Note: When I do PROC SORT by SSN, observations 1-195 are blank/missing which corresponds to the dataset in which those values are missing.)
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @mlensing,

 

There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:

...
set work.Contact_IA
    work.Contact_MS
    work.Contact_UT(in=UT);
nossn=UT;
...

The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):

proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */;
by nossn ssn;
run;

 

Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:

proc sql;
create table want as
select * from HypTabs.Contact;
order by missing(ssn), ssn;
quit;

Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

You could simply sort by Descending SSN. 

 

PROC SORT DATA = HypTabs.Contact;
BY descending SSN;
RUN;

 

Is it a requirement that besides missing data, the SSN's are sorted ascending?

mlensing
Obsidian | Level 7

Hi, yes it is a requirement to sort SSN ascending. Thank you for clarifying.

FreelanceReinh
Jade | Level 19

Hi @mlensing,

 

There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:

...
set work.Contact_IA
    work.Contact_MS
    work.Contact_UT(in=UT);
nossn=UT;
...

The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):

proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */;
by nossn ssn;
run;

 

Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:

proc sql;
create table want as
select * from HypTabs.Contact;
order by missing(ssn), ssn;
quit;

Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.

mlensing
Obsidian | Level 7
Thank you so much, this worked perfectly! I really appreciate your help and thorough response!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3535 views
  • 5 likes
  • 4 in conversation