BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Aves9019
Obsidian | Level 7

Hello SAS community I need help with the following. I am attempting to sort a data set of applications by 2 variables. One variable being the application status and the other being the application id. I would like to drop the observations containing a status of completed which fall under the same id. I have attempted using a proc sort with a nodup option but end up with a blank table for my output. Below is what I have so far. Thank you for your time! 

 

proc sort data=set out=set1 nodup;

by id;

where status='completed'

run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

 

proc sort data = example
               out = completed
              nodupkey;
  by id status;
  where status = 'C';
run;
 
proc sort data = example
               out = aborted
              ;
  by id status;
  where status = 'A';
run;
 
data want;
  merge completed aborted;
   by id status;
run;

 

View solution in original post

10 REPLIES 10
MaikH_Schutze
Quartz | Level 8

Hi,

 

Your criteria are a little more complex than what your code accomplishes.

 

To start, you need to add a semicolon to end the where statement. You can also add the second variable, status, to your BY statement.

 

The WHERE statement exlcudes all records where the status is equal to 'completed', not sure if that's what you want because it doesn't guarantee that there is another record for this ID where the status is  something other than 'completed'. And your criteria state you would like to "drop the observations containing a status of completed which fall under the same id", so that maybe something to look into, if there are IDs with two or more records where one is complete.

pablo_joerger
Calcite | Level 5

maybe this helps:

 

proc sort data=set (where=(status='completed')) out=set1 nodup;

by id;

run ;

Aves9019
Obsidian | Level 7

This returned a blank dataset also. I am not sure why

MikeZdeb
Rhodochrosite | Level 12

Hi. These are identical since the WHERE statement also only applies to the input data set, not the output ...

 

proc sort data=x out=y nodup;
by id;
where status eq 'completed';
run;

 

proc sort data=x (where=(status eq 'completed')) out=y nodup;
by id;
where status eq 'completed';
run;

Aves9019
Obsidian | Level 7
Thank you for your response. I would like to keep all observations and duplicates except for the observations that are duplicated within the status of completed. So if I have two completed applications by the same id i want to delete those.
ballardw
Super User

Please provide an example of an input dataset, preferably in the form of a data step that we can all execute, and wha the desired result for that example data is. The data should provide examples of all of the issues you are addressing.

Aves9019
Obsidian | Level 7

DATA EXAMPLE;

INPUT ID STATUS;

DATALINES;

100 C

200 A

300 C

400 C

500 C

400 C

300 A

200 A

100 C

500 A

This is an example in a data step. I want to remove any duplicate statuses of completed. For instance id number 100 and 400 have a duplicate status of C(completed) I would like the new data set to present 100 and 400 once with the status completed. Id number 200 is duplicated but has a status of A(aborted). I would like to keep both of these obeservations. With the correct code the new data set would present as follows: 

100 C

200 A

300 C

400 C

500 C

300 A

200 A

500 A

 

 

SASKiwi
PROC Star

 

proc sort data = example
               out = completed
              nodupkey;
  by id status;
  where status = 'C';
run;
 
proc sort data = example
               out = aborted
              ;
  by id status;
  where status = 'A';
run;
 
data want;
  merge completed aborted;
   by id status;
run;

 

MikeZdeb
Rhodochrosite | Level 12

Hi. Would you accept SORT + a DATA STEP as a solution ...


data example;
input id status :$1. @@;
datalines;
100 C 200 A 300 C 400 C 500 C 400 C 300 A 200 A 100 C 500 A
;

 

proc sort data=example;
by id;
run;

 

data want (drop=complete);
do until (last.id);
   set example;
   by id;
   complete + (status eq 'C');
   if status eq 'A' or complete eq 1 then output;
end;
complete = 0;
run;

 

data set WANT ...

Obs     id    status

 1     100      C
 2     200      A
 3     200      A
 4     300      C
 5     300      A
 6     400      C
 7     500      C
 8     500      A

 

MikeZdeb
Rhodochrosite | Level 12

Hi. On further thought, these two also work ...

 

data example;
input id status :$1. @@;
datalines;
100 C 200 A 300 C 400 C 500 C 400 C 300 A 200 A 100 C 500 A
;

 

* use SQL;

proc sql;
create table want as
select distinct * from example where status = 'C'
union all
select * from example where status = 'A'
order id, status desc;
quit;

 

* or, use SORT plus a simple data step;

proc sort data=example;
by id descending status;
run;


data want;
set example;
by id;
if (first.id and status eq 'C') or status eq 'A';
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3813 views
  • 0 likes
  • 6 in conversation