10-31-2015 05:07 PM
I have a questions here:
You have just written code that assigns each patient in your data set to one of four mutually exclusive subsets. How would you check to make sure that every patient has been assigned to a subset and none are in more than one? Knowing that every patient should fall into one of these categories, how would you handle those that don't?
Anyone has know how to solve this? Even some idea are okay.
10-31-2015 08:28 PM
One simple way is to look at the log to find out if the counts from the parent dataset match with the sum of the subset datasets, that is if the subsets themselves are datasets.
Another easy was is that if you are using a variable that identifies where the subset records are written to, you may do a frequency of that variable and tally them up with the subsets.
Third way I can think of is that if there is a key, you can write a merge to see where each record has ended up by merging with the key.
You could possibly write a proc compare but I would guess one of the above options would have already displayed the information you are looking for...
10-31-2015 10:57 PM
To get a list of the potentially problematic cases, try
proc sql; select patientID, Subset from myData group by patientID having count(distinct Subset) ne 1; quit;
How to correct those cases will depend on the meaning of those subsets. You could do it by hand if the number is small or have to refine your assignment procedure if there are too many.
10-31-2015 11:38 PM
10-31-2015 11:54 PM
My code supposes that your dataset myData contains two variables. The variable patientID identifies every patient uniquely. The variable Subset would take values A, B, C, or D. The SQL query will print the list of records of patients and their subsets only for patients with more than one subset, or no subset at all (missing subset value).
Instead of printing the records, you may create a new dataset with the troublesome records with:
proc sql; create table myTroubleData as select * from myData group by patientID having count(distinct subset) ne 1; quit;
11-01-2015 06:37 AM
if (condition one) then output dataset1; else if (condition two) then output dataset2; else if (condition three) then output dataset3; else if (condition four) then output dataset4; else abort;
11-02-2015 10:45 AM
Not effecient by any means but relatively easy to understand:
proc freq data= have;
tables id * CodedVariable / norow nocol nopercent;
Table with any ID and 2 or more values for the recode are problematic.
proc print data=have;
where id in (<list the id values from above here>);
var id <and list of variables used in the recode> CodedVariable;
That should give an idea of which rule(s) may not have been applied correctly.