Dear all,
I have recently started SAS programming and wanted to sort out missing values from a table which I detected using:
proc freq data=cr.employee_raw order=freq nlevels ;
tables EmpID Country Department;
run;
In order to have a sorted table containing values fulfilling certain requirements, I written the following code:
requirements:
proc step:
proc sort data=cr.employee_raw out=cr.employee_sort nodup dupout=cr.employee_dup;
by EmpID;
where Country in ("US", "AU") and TermDate is not missing and TermDate > HireDate;
run;
However, only the correct rows were read from the employee_raw data not giving me the chance to generate the out dups table.
How could I modify the syntax to have both tables containing the "right" and "wrong" values, and why was the where statement operating like this? Has it something to do with where being used in the compilation phase of the PDV setup?
Thank you and best regards!
I would try this:
proc sort data=have;
by empid;
run;
data
right
wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;
Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.
Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that
A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.
I don't think that i understand what you are trying to achieve fully. Posting the data you have (a small excerpt should sufficient) and showing what you expect as result, will help us to help you.
The where statement always removes obs not fulfilling it, so that only "valid" obs are processed at all.
The where statement will get applied on row level when reading your data and prior to the sort operation.
NODUPKEY (use this instead of NODUP) will drop any record with a duplicate key as per your by group in the sort.
DUPOUT will then have all the data dropped by NODUPKEY but it won't have the data that got already removed by the where clause when reading the data and before any sorting.
Thank you for the replies! In this case, what other statements could I use so that the data is not removed prior to operation and is sorted into the respective tables? I learned that an if-statement could do this for a data step, but I could not apply it to the sorting.
I would try this:
proc sort data=have;
by empid;
run;
data
right
wrong
;
set have;
by empid;
if not first.empid or country not in ('AU','US') or term_date lt hire_date
then output wrong;
else output right;
run;
Since a missing value is lower than any other value, the missing value automatically fulfills the last condition.
Code is untested; for tested code, please supply example data as a data step with datalines, not as Excel files that
A data step only needs a copy/paste and submit and presents unambiguous data. Help us to help you.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.