Solved: Re: Sort missing character variable values so that missing values appe...

mlensing · Posted 11-08-2020 01:19 AM

Hi everyone,

As I'm still a bit new to SAS, I wanted to reach out for guidance surrounding sorting my dataset. I have created a master dataset composed of 3 datasets in which 2 have filled values for SSN (character variable) while the last does not have any values for SSN. I want to sort my master dataset by SSN and so that the missing values appear last/at the end of the dataset. Is this possible and is there a straightforward way to do this? Thank you in advance!

DATA HypTabs.Contact;

LENGTHSSN $11.

Inits $3.

City $20.

StateCd $2.

ZipCd $5.;

SETWORK.Contact_IA

WORK.Contact_MS

WORK.Contact_UT;

LABELSSN= 'Social Security Number'

Inits= 'Subject Initials'

City= 'City'

StateCd= 'State Code'

ZipCd= 'Zip Code';

RUN;

PROC SORT DATA = HypTabs.Contact;

BY SSN;

RUN;

(Note: When I do PROC SORT by SSN, observations 1-195 are blank/missing which corresponds to the dataset in which those values are missing.)

FreelanceReinh · Posted 11-08-2020 06:29 AM

Hi @mlensing,

There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:

...
set work.Contact_IA
    work.Contact_MS
    work.Contact_UT(in=UT);
nossn=UT;
...

The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):

proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */;
by nossn ssn;
run;

Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:

proc sql;
create table want as
select * from HypTabs.Contact;
order by missing(ssn), ssn;
quit;

Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.

View solution in original post

PeterClemmensen · Posted 11-08-2020 02:08 AM

You could simply sort by Descending SSN.

PROC SORT DATA = HypTabs.Contact;
BY descending SSN;
RUN;

Is it a requirement that besides missing data, the SSN's are sorted ascending?

The DATA to DATA Step Macro
Blog: SASnrd

mlensing · Posted 11-08-2020 11:38 AM

Hi, yes it is a requirement to sort SSN ascending. Thank you for clarifying.

PeterClemmensen · Posted 11-08-2020 11:40 AM

@mlensing, then @FreelanceReinhs answer is the way to go 🙂

The DATA to DATA Step Macro
Blog: SASnrd

FreelanceReinh · Posted 11-08-2020 06:29 AM

Hi @mlensing,

There are various ways to achieve what you want. draycut's suggestion is short and elegant. To sort the non-missing SSN values first in ascending order, followed by the missing values, you could create an additional sort key in your DATA step:

...
set work.Contact_IA
    work.Contact_MS
    work.Contact_UT(in=UT);
nossn=UT;
...

The IN= dataset option creates a temporary 0-1 flag so that UT=1 characterizes observations coming from work.Contact_UT (assuming that these are the records with missing SSN). The subsequent assignment statement makes this flag permanent, now named nossn. Adding variable nossn as the first sort key in the BY statement of your PROC SORT step ensures that observations with nossn=0, i.e., the observations from Contact_IA or Contact_MS, are sorted first, followed by those with nossn=1 from Contact_UT. You may want to drop variable nossn from the final dataset (commented out below):

proc sort data=HypTabs.Contact /* out=HypTabs.Contact(drop=nossn) */;
by nossn ssn;
run;

Alternatively, you can take advantage of the flexibility of an ORDER BY clause in PROC SQL. There you can create an additional sort key "on the fly," i.e., you don't need to modify your DATA step:

proc sql;
create table want as
select * from HypTabs.Contact;
order by missing(ssn), ssn;
quit;

Observations with missing SSN have missing(ssn)=1, otherwise missing(ssn)=0. The sort order within these two subsets is not guaranteed by PROC SQL, though, so you may want to add more sort keys to define it.

mlensing · Posted 11-08-2020 10:21 PM

Thank you so much, this worked perfectly! I really appreciate your help and thorough response!

Kurt_Bremser · Posted 11-08-2020 06:36 AM

proc sort
  data=HypTabs.Contact (
    where=(SSN ne "")
  )
  out=want
;
by SSN;
run;

proc append
  base=want
  data=HypTabs.Contact (
    where=(SSN = "")
  )
;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

Re: Sort missing character variable values so that missing values appear last in dataset

SAS Innovate 2026 Registration is Open