Solved: Using proc sort to seperate table based on certain conditions

ifb10 · Posted 02-14-2022 05:25 AM

Dear all,

I have recently started SAS programming and wanted to sort out missing values from a table which I detected using:

proc freq data=cr.employee_raw order=freq nlevels ;
tables EmpID Country Department;
run;

In order to have a sorted table containing values fulfilling certain requirements, I written the following code:

requirements:

Values in the EmpID column must be unique.
Values in the Country column should be either US or AU.
There are 17 unique department names.
If TermDate has a known value, it should be after HireDate.

proc step:

proc sort data=cr.employee_raw out=cr.employee_sort nodup dupout=cr.employee_dup;
by EmpID;
where Country in ("US", "AU") and TermDate is not missing and TermDate > HireDate;
run;

However, only the correct rows were read from the employee_raw data not giving me the chance to generate the out dups table.

NOTE: There were 113 observations read from the data set CR.EMPLOYEE_RAW.

WHERE Country in ('AU', 'US') and (TermDate is not null) and (TermDate>HireDate);

How could I modify the syntax to have both tables containing the "right" and "wrong" values, and why was the where statement operating like this? Has it something to do with where being used in the compilation phase of the PDV setup?

Thank you and best regards!

Kurt_Bremser · Posted 02-15-2022 04:24 AM

I would try this:

proc sort data=have;
by empid;
run;

data
  right
  wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;

Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.

Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that

are unsafe because of possible malware, so lots of us are wary to download them, and some have them even blocked by their corporate firewalls
do not have fixed attributes for columns and therefore do not tell us important information about SAS datasets
need to be imported into SAS by unreliable means which cause extra work (download, upload to SAS server, process)

A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

andreas_lds · Posted 02-14-2022 06:26 AM

I don't think that i understand what you are trying to achieve fully. Posting the data you have (a small excerpt should sufficient) and showing what you expect as result, will help us to help you.

The where statement always removes obs not fulfilling it, so that only "valid" obs are processed at all.

Patrick · Posted 02-14-2022 06:36 AM

The where statement will get applied on row level when reading your data and prior to the sort operation.

NODUPKEY (use this instead of NODUP) will drop any record with a duplicate key as per your by group in the sort.

DUPOUT will then have all the data dropped by NODUPKEY but it won't have the data that got already removed by the where clause when reading the data and before any sorting.

ifb10 · Posted 02-15-2022 02:07 AM

Thank you for the replies! In this case, what other statements could I use so that the data is not removed prior to operation and is sorted into the respective tables? I learned that an if-statement could do this for a data step, but I could not apply it to the sorting.

Kurt_Bremser · Posted 02-15-2022 04:24 AM

I would try this:

proc sort data=have;
by empid;
run;

data
  right
  wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;

Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.

Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that

are unsafe because of possible malware, so lots of us are wary to download them, and some have them even blocked by their corporate firewalls
do not have fixed attributes for columns and therefore do not tell us important information about SAS datasets
need to be imported into SAS by unreliable means which cause extra work (download, upload to SAS server, process)

A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Re: Using proc sort to seperate table based on certain conditions

Click image to register for webinar

Classroom Training Available!