SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

 @Tom  @Quentin   @Astounding  @ballardw  Thankyou all for your contribution to find a solution to this problem

After consultation with the customer. Here is a detailed explanation of his intentions with the data. I have adjusted (added the variable that contains the person who entered the data) the data a little.

 

data have;
input id $ pnr pid time cn editor var1 var2 var3 var4 var5;
datalines;
1 12 2 1 1 1 5 6 7 3 4
1 12 2 1 2 2 5 7 9 3 5
1 12 2 2 2 3 3 8 5 1 5
1 12 2 2 2 4 3 8 7 1 6

15 6 3 1 1 1 3 5 8 3 4
15 6 3 2 2 2 3 6 8 3 5
15 6 3 2 2 3 1 7 5 1 5

25 7 3 1 1 1 3 5 8 3 4
25 7 3 2 2 2 3 6 8 3 5
25 7 3 1 2 3 1 7 5 1 5
25 7 3 1 2 4 3 8 7 1 6

;
run;

a) Identify the matching pairs of the 1st and 2nd input 
       - there should always be 2 observations that match in the variables with match in the id, pnr and pid  
       - these observations differ by the number of the control input in the variable 'control' (1 or 2)

 

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with observations that do not yet have a 2nd input
       - a group with observations for which there is a 2nd input

 

c) Compare the observations that exist twice with each other, prerequisite as described above with match in id, pnr, pid
     - then compare whether the var1-var5 match (by rows) or not,
      - observations that do not match should be output so that we know what needs to be corrected

 

So these are the notes I made. I hope someone can give me a helping hand. Even if it's just the start because am little bit confused of where to start from. Thanks

ballardw
Super User

Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.

 

Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.

 

You may need to provide a bit more detail about this

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input

As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs

 

Anita_n
Pyrite | Level 9

@ballardw  Thanks for your remarks:

Q: Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.

A: I did not want to use 1, 2 for that this doesn't confuse with time and control

 

Q: Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.

A: Sorry I have editted this above

 

Q: 

You may need to provide a bit more detail about this.

b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input

As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs

 

A:  Identify the matching pairs of the 1st and 2nd input 
       - there should always be 2 observations that match in the variables with match in the id, pnr, pid and time 

       - The values of id, pnr, pid  should always be the same to become a matching pair. In variable "control" there should be a first and second entry to see that two different people entered the data

       

     - Like in observation one and two  id=1 , since id, pnr, pid match they are pairs and there exist control 1 and 2 for the values of var1 to var5 could be checked to if if they are the same or not. If they are the same I output them in a dataset  "all_values_match". If they do not match I output them in a data set "values_doesnt_match"

 

     -Then I check for the next pair with the same id,  pnr, pid but I have only "control"=2 so  I know control one is definitely missing.  I then check if these are duplicates if not I output them in dataset "missing_entry" etc

 

I hope I could answer your questions

Tom
Super User Tom
Super User

So trying to pull key information from your post it looks like groups in the data is identified by the values of three variables:  id pnr pid

 

Each group could appear once or twice. (Are you sure there is never a third or fourth observation in a group?)

 

So split the data into three files.  The singletons.  The first and the second.

data single first second;
  set have;
  by id pnr pid;
  if first.pid and last.pid then output single;
  else if first.pid then output first;
  else output second;
run;

So the dataset SINGLE is the answer to which ones have not had what we used to call second pass data entry.

 

You can then PROC COMPARE with the other two datasets to see which records had differences.

 

Quentin
Super User

Let's make a simpler example dataset.  The data has one row per PatientID.  The data for each PatienID was entered by two different people (EntryID):

 

data have;
input ptid entryid var1 var2 var3;
datalines;
1 1 2 4 6
1 2 2 5 6
2 1 10 12 14
2 2 10 12 14
3 1 2 4 6
;
run;

So that is basically your data, but with fewer ID variables.  When asking questions, it's helpful to make the HAVE data as simple as possible.  Anything that works for one ID variable can work for multiple.

 

With that, I would use PROC COMPARE:

proc compare base=have(where=(entryid=1)) compare=have(where=(entryid=2)) listobs;
  var var1-var3 ;
  id ptid ;
run ;

That will show you where are are differences in the value of var1-var3.  It will also show you when data for a patient was only entered once.  If data for a patient was entered 3 times, you'll get a warning in your log about duplicate data. It will work for numeric data and character data.  You don't even have to list all of the variables.  PROC COMPARE is fabulous!

If you want output data instead of a report, you can explore the options in PROC COMPARE to create output datasets.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Anita_n
Pyrite | Level 9

@Tom @Quentin @ballardw @Astounding  You guys are really great. Thanks a lot for the great job. It worked, after a little changes

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 2316 views
  • 10 likes
  • 5 in conversation