Solved: Re: PROC SORT NODUP

Aves9019 · Posted 10-28-2015 12:07 PM

Hello SAS community I need help with the following. I am attempting to sort a data set of applications by 2 variables. One variable being the application status and the other being the application id. I would like to drop the observations containing a status of completed which fall under the same id. I have attempted using a proc sort with a nodup option but end up with a blank table for my output. Below is what I have so far. Thank you for your time!

proc sort data=set out=set1 nodup;

by id;

where status='completed'

run;

SASKiwi · Posted 10-28-2015 04:02 PM

proc sort data = example
               out = completed
              nodupkey;
  by id status;
  where status = 'C';
run;
 
proc sort data = example
               out = aborted
              ;
  by id status;
  where status = 'A';
run;
 
data want;
  merge completed aborted;
   by id status;
run;

View solution in original post

MaikH_Schutze · Posted 10-28-2015 12:16 PM

Hi,

Your criteria are a little more complex than what your code accomplishes.

To start, you need to add a semicolon to end the where statement. You can also add the second variable, status, to your BY statement.

The WHERE statement exlcudes all records where the status is equal to 'completed', not sure if that's what you want because it doesn't guarantee that there is another record for this ID where the status is something other than 'completed'. And your criteria state you would like to "drop the observations containing a status of completed which fall under the same id", so that maybe something to look into, if there are IDs with two or more records where one is complete.

pablo_joerger · Posted 10-28-2015 12:27 PM

maybe this helps:

proc sort data=set (where=(status='completed')) out=set1 nodup;

by id;

run ;

Aves9019 · Posted 10-28-2015 12:40 PM

This returned a blank dataset also. I am not sure why

MikeZdeb · Posted 10-28-2015 04:43 PM

Hi. These are identical since the WHERE statement also only applies to the input data set, not the output ...

proc sort data=x out=y nodup;
by id;
where status eq 'completed';
run;

proc sort data=x (where=(status eq 'completed')) out=y nodup;
by id;
where status eq 'completed';
run;

Aves9019 · Posted 10-28-2015 12:28 PM

Thank you for your response. I would like to keep all observations and duplicates except for the observations that are duplicated within the status of completed. So if I have two completed applications by the same id i want to delete those.

ballardw · Posted 10-28-2015 01:52 PM

Please provide an example of an input dataset, preferably in the form of a data step that we can all execute, and wha the desired result for that example data is. The data should provide examples of all of the issues you are addressing.

Aves9019 · Posted 10-28-2015 02:30 PM

DATA EXAMPLE;

INPUT ID STATUS;

DATALINES;

100 C

200 A

300 C

400 C

500 C

400 C

300 A

200 A

100 C

500 A

This is an example in a data step. I want to remove any duplicate statuses of completed. For instance id number 100 and 400 have a duplicate status of C(completed) I would like the new data set to present 100 and 400 once with the status completed. Id number 200 is duplicated but has a status of A(aborted). I would like to keep both of these obeservations. With the correct code the new data set would present as follows:

100 C

200 A

300 C

400 C

500 C

300 A

200 A

500 A

SASKiwi · Posted 10-28-2015 04:02 PM

proc sort data = example
               out = completed
              nodupkey;
  by id status;
  where status = 'C';
run;
 
proc sort data = example
               out = aborted
              ;
  by id status;
  where status = 'A';
run;
 
data want;
  merge completed aborted;
   by id status;
run;

MikeZdeb · Posted 10-28-2015 05:02 PM

Hi. Would you accept SORT + a DATA STEP as a solution ...

data example;
input id status :$1. @@;
datalines;
100 C 200 A 300 C 400 C 500 C 400 C 300 A 200 A 100 C 500 A
;

proc sort data=example;
by id;
run;

data want (drop=complete);
do until (last.id);
   set example;
   by id;
   complete + (status eq 'C');
   if status eq 'A' or complete eq 1 then output;
end;
complete = 0;
run;

data set WANT ...

Obs id status

1     100      C
2     200      A
3     200      A
4     300      C
5     300      A
6     400      C
7     500      C
8     500      A

MikeZdeb · Posted 10-29-2015 08:45 AM

Hi. On further thought, these two also work ...

data example;
input id status :$1. @@;
datalines;
100 C 200 A 300 C 400 C 500 C 400 C 300 A 200 A 100 C 500 A
;

* use SQL;

proc sql;
create table want as
select distinct * from example where status = 'C'
union all
select * from example where status = 'A'
order id, status desc;
quit;

* or, use SORT plus a simple data step;

proc sort data=example;
by id descending status;
run;

data want;
set example;
by id;
if (first.id and status eq 'C') or status eq 'A';
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away