BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

 @Tom  @Quentin   @Astounding  @ballardw  Thankyou all for your contribution to find a solution to this problem

After consultation with the customer. Here is a detailed explanation of his intentions with the data. I have adjusted (added the variable that contains the person who entered the data) the data a little.

 

data have;
input id $ pnr pid time cn editor var1 var2 var3 var4 var5;
datalines;
1 12 2 1 1 1 5 6 7 3 4
1 12 2 1 2 2 5 7 9 3 5
1 12 2 2 2 3 3 8 5 1 5
1 12 2 2 2 4 3 8 7 1 6

15 6 3 1 1 1 3 5 8 3 4
15 6 3 2 2 2 3 6 8 3 5
15 6 3 2 2 3 1 7 5 1 5

25 7 3 1 1 1 3 5 8 3 4
25 7 3 2 2 2 3 6 8 3 5
25 7 3 1 2 3 1 7 5 1 5
25 7 3 1 2 4 3 8 7 1 6

;
run;

a) Identify the matching pairs of the 1st and 2nd input 
       - there should always be 2 observations that match in the variables with match in the id, pnr and pid  
       - these observations differ by the number of the control input in the variable 'control' (1 or 2)

 

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with observations that do not yet have a 2nd input
       - a group with observations for which there is a 2nd input

 

c) Compare the observations that exist twice with each other, prerequisite as described above with match in id, pnr, pid
     - then compare whether the var1-var5 match (by rows) or not,
      - observations that do not match should be output so that we know what needs to be corrected

 

So these are the notes I made. I hope someone can give me a helping hand. Even if it's just the start because am little bit confused of where to start from. Thanks

ballardw
Super User

Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.

 

Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.

 

You may need to provide a bit more detail about this

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input

As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs

 

Anita_n
Pyrite | Level 9

@ballardw  Thanks for your remarks:

Q: Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.

A: I did not want to use 1, 2 for that this doesn't confuse with time and control

 

Q: Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.

A: Sorry I have editted this above

 

Q: 

You may need to provide a bit more detail about this.

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input

As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs

 

A:  Identify the matching pairs of the 1st and 2nd input 
       - there should always be 2 observations that match in the variables with match in the id, pnr, pid and time 

       - The values of id, pnr, pid  should always be the same to become a matching pair. In variable "control" there should be a first and second entry to see that two different people entered the data

       

     - Like in observation one and two  id=1 , since id, pnr, pid match they are pairs and there exist control 1 and 2 for the values of var1 to var5 could be checked to if if they are the same or not. If they are the same I output them in a dataset  "all_values_match". If they do not match I output them in a data set "values_doesnt_match"

 

     -Then I check for the next pair with the same id,  pnr, pid but I have only "control"=2 so  I know control one is definitely missing.  I then check if these are duplicates if not I output them in dataset "missing_entry" etc

 

I hope I could answer your questions

Tom
Super User Tom
Super User

So trying to pull key information from your post it looks like groups in the data is identified by the values of three variables:  id pnr pid

 

Each group could appear once or twice. (Are you sure there is never a third or fourth observation in a group?)

 

So split the data into three files.  The singletons.  The first and the second.

data single first second;
  set have;
  by id pnr pid;
  if first.pid and last.pid then output single;
  else if first.pid then output first;
  else output second;
run;

So the dataset SINGLE is the answer to which ones have not had what we used to call second pass data entry.

 

You can then PROC COMPARE with the other two datasets to see which records had differences.

 

Quentin
Super User

Let's make a simpler example dataset.  The data has one row per PatientID.  The data for each PatienID was entered by two different people (EntryID):

 

data have;
input ptid entryid var1 var2 var3;
datalines;
1 1 2 4 6
1 2 2 5 6
2 1 10 12 14
2 2 10 12 14
3 1 2 4 6
;
run;

So that is basically your data, but with fewer ID variables.  When asking questions, it's helpful to make the HAVE data as simple as possible.  Anything that works for one ID variable can work for multiple.

 

With that, I would use PROC COMPARE:

proc compare base=have(where=(entryid=1)) compare=have(where=(entryid=2)) listobs;
  var var1-var3 ;
  id ptid ;
run ;

That will show you where are are differences in the value of var1-var3.  It will also show you when data for a patient was only entered once.  If data for a patient was entered 3 times, you'll get a warning in your log about duplicate data. It will work for numeric data and character data.  You don't even have to list all of the variables.  PROC COMPARE is fabulous!

If you want output data instead of a report, you can explore the options in PROC COMPARE to create output datasets.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
Anita_n
Pyrite | Level 9

@Tom @Quentin @ballardw @Astounding  You guys are really great. Thanks a lot for the great job. It worked, after a little changes

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1601 views
  • 10 likes
  • 5 in conversation