Hi everyone,
As I'm still a bit new to SAS, I wanted to reach out for guidance surrounding sorting my dataset. I have created a master dataset composed of 3 datasets in which 2 have filled values for SSN (character variable) while the last does not have any values for SSN. I want to sort my master dataset by SSN and so that the missing values appear last/at the end of the dataset. Is this possible and is there a straightforward way to do this? Thank you in advance!
Hi @mlensing,
There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:
... set work.Contact_IA work.Contact_MS work.Contact_UT(in=UT); nossn=UT; ...
The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):
proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */; by nossn ssn; run;
Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:
proc sql; create table want as select * from HypTabs.Contact; order by missing(ssn), ssn; quit;
Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.
You could simply sort by Descending SSN.
PROC SORT DATA = HypTabs.Contact;
BY descending SSN;
RUN;
Is it a requirement that besides missing data, the SSN's are sorted ascending?
Hi, yes it is a requirement to sort SSN ascending. Thank you for clarifying.
@mlensing, then @FreelanceReinhs answer is the way to go 🙂
Hi @mlensing,
There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:
... set work.Contact_IA work.Contact_MS work.Contact_UT(in=UT); nossn=UT; ...
The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):
proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */; by nossn ssn; run;
Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:
proc sql; create table want as select * from HypTabs.Contact; order by missing(ssn), ssn; quit;
Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.
proc sort
data=HypTabs.Contact (
where=(SSN ne "")
)
out=want
;
by SSN;
run;
proc append
base=want
data=HypTabs.Contact (
where=(SSN = "")
)
;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.