BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ifb10
Obsidian | Level 7

Dear all,

 

I have recently started SAS programming and wanted to sort out missing values from a table which I detected using:

 

proc freq data=cr.employee_raw order=freq nlevels ;
tables EmpID Country Department;
run;

 

In order to have a sorted table containing values fulfilling certain requirements, I written the following code:

 

requirements:

  • Values in the EmpID column must be unique.
  • Values in the Country column should be either US or AU.
  • There are 17 unique department names.
  • If TermDate has a known value, it should be after HireDate.

proc step:

proc sort data=cr.employee_raw out=cr.employee_sort nodup dupout=cr.employee_dup;
by EmpID;
where Country in ("US", "AU") and TermDate is not missing and TermDate > HireDate;
run;

 

However, only the correct rows were read from the employee_raw data not giving me the chance to generate the out dups table.

 

NOTE: There were 113 observations read from the data set CR.EMPLOYEE_RAW.
WHERE Country in ('AU', 'US') and (TermDate is not null) and (TermDate>HireDate);

 

How could I modify the syntax to have both tables containing the "right" and "wrong" values, and why was the where statement operating like this? Has it something to do with where being used in the compilation phase of the PDV setup?

 

Thank you and best regards!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I would try this:

proc sort data=have;
by empid;
run;

data
  right
  wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;

Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.

 

Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that

  • are unsafe because of possible malware, so lots of us are wary to download them, and some have them even blocked by their corporate firewalls
  • do not have fixed attributes for columns and therefore do not tell us important information about SAS datasets
  • need to be imported into SAS by unreliable means which cause extra work (download, upload to SAS server, process)

A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.

View solution in original post

4 REPLIES 4
andreas_lds
Jade | Level 19

I don't think that i understand what you are trying to achieve fully. Posting the data you have (a small excerpt should sufficient) and showing what you expect as result, will help us to help you.

The where statement always removes obs not fulfilling it, so that only "valid" obs are processed at all.

 

Patrick
Opal | Level 21

The where statement will get applied on row level when reading your data and prior to the sort operation. 

 

NODUPKEY (use this instead of NODUP) will drop any record with a duplicate key as per your by group in the sort.

DUPOUT will then have all the data dropped by NODUPKEY but it won't have the data that got already removed by the where clause when reading the data and before any sorting.

 

 

ifb10
Obsidian | Level 7

Thank you for the replies! In this case, what other statements could I use so that the data is not removed prior to operation and is sorted into the respective tables? I learned that an if-statement could do this for a data step, but I could not apply it to the sorting.

Kurt_Bremser
Super User

I would try this:

proc sort data=have;
by empid;
run;

data
  right
  wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;

Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.

 

Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that

  • are unsafe because of possible malware, so lots of us are wary to download them, and some have them even blocked by their corporate firewalls
  • do not have fixed attributes for columns and therefore do not tell us important information about SAS datasets
  • need to be imported into SAS by unreliable means which cause extra work (download, upload to SAS server, process)

A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 955 views
  • 6 likes
  • 4 in conversation