BookmarkSubscribeRSS Feed
amarikow57
Obsidian | Level 7

I have data of something like this:

 

ID EVENT SCORE

1   1           3

1   2           3

2   1           7

2   2           7

3   1           8

3   2           6

4   1           .

4   2           5

 

I care about scores not matching for an ID. That is IDs 3 and 4 would be of interest. Is there a way to view only those IDs keeping in mind that my actual dataset has many more IDs than this?

 

Thanks in advance.

4 REPLIES 4
ballardw
Super User

The important part may be more in the realm of how many events?

Is your Score variable numeric or character? I have to ask because there are different approaches available for different types of data.

 

Do you just want the ID's of interest or are there other variables that need to be in the output?

It is always a good idea to show the desired output should look like.

amarikow57
Obsidian | Level 7

I have around 8 events in total, but only 2 of the events contain the scores I am interested in. These are stored as numeric.

Ideally I want my dataset to contain the single score available and the matching scores. To decide what to do with the scores that are different, I want to be able to view those ID observations and decide from there if they are something to keep.

ballardw
Super User

@amarikow57 wrote:

I have around 8 events in total, but only 2 of the events contain the scores I am interested in. These are stored as numeric.

Ideally I want my dataset to contain the single score available and the matching scores. To decide what to do with the scores that are different, I want to be able to view those ID observations and decide from there if they are something to keep.


Doesn't quite tell me how you expect the output, and brings up the need for an example with up to 8 events, such as how do we know which scores for which events are of interest.

 

I would be tempted to start with something like this to identify the mismatches:

 

proc summary data=have nway;
   class id;
   var score;
   output out=summary(drop=_: where=(score_range ne 0 or score_nmiss>0) )
      min= max= range= nmiss= /autoname;
   ;
run;

Range is the largest value - lowest, so if not equal to 0 then there is a difference for a numeric variable.

However when you have missing the Range only uses non-missing values so wouldn't catch the ones with missing. So this also counts the number of Nmiss, missing , values.

 

Ksharp
Super User
data have;
input ID EVENT SCORE;
cards;
1   1           3
1   2           3
2   1           7
2   2           7
3   1           8
3   2           6
4   1           .
4   2           5
;
proc sql;
create table want as
select *
 from have
  group by id
   having count(distinct score) > 1 or ( n(score) and nmiss(score))
;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 527 views
  • 0 likes
  • 3 in conversation