Make sure every observations has been assigned to a subset

echo991 · Posted 10-31-2015 05:07 PM

Hi,

I have a questions here:

You have just written code that assigns each patient in your data set to one of four mutually exclusive subsets. How would you check to make sure that every patient has been assigned to a subset and none are in more than one? Knowing that every patient should fall into one of these categories, how would you handle those that don't?

Anyone has know how to solve this? Even some idea are okay.

Many thanks,

Chen

kannand · Posted 10-31-2015 08:28 PM

One simple way is to look at the log to find out if the counts from the parent dataset match with the sum of the subset datasets, that is if the subsets themselves are datasets.

Another easy was is that if you are using a variable that identifies where the subset records are written to, you may do a frequency of that variable and tally them up with the subsets.

Third way I can think of is that if there is a key, you can write a merge to see where each record has ended up by merging with the key.

You could possibly write a proc compare but I would guess one of the above options would have already displayed the information you are looking for...

Good Luck...!!!

Kannan Deivasigamani

PGStats · Posted 10-31-2015 10:57 PM

To get a list of the potentially problematic cases, try

proc sql;
select patientID, Subset
from myData
group by patientID
having count(distinct Subset) ne 1;
quit;

How to correct those cases will depend on the meaning of those subsets. You could do it by hand if the number is small or have to refine your assignment procedure if there are too many.

PG

echo991 · Posted 10-31-2015 11:38 PM

Hi PG,
I can't understand your writing totally, what does select subset mean?
If I have a dataset named myData, and all patients id in myData has set in to one of four mutually exclusive subsets (A, B, C, D), how could I make sure every id has in one of four subsets, no duplicate and no obsence?
Could you help me more, please?

Many thanks,
Chen

PGStats · Posted 10-31-2015 11:54 PM

My code supposes that your dataset myData contains two variables. The variable patientID identifies every patient uniquely. The variable Subset would take values A, B, C, or D. The SQL query will print the list of records of patients and their subsets only for patients with more than one subset, or no subset at all (missing subset value).

Instead of printing the records, you may create a new dataset with the troublesome records with:

proc sql;
create table myTroubleData as
select *
from myData
group by patientID
having count(distinct subset) ne 1;
quit;

PG

Kurt_Bremser · Posted 11-01-2015 06:37 AM

if (condition one) then output dataset1;
else if (condition two) then output dataset2;
else if (condition three) then output dataset3;
else if (condition four) then output dataset4;
else abort;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 11-02-2015 10:45 AM

Not effecient by any means but relatively easy to understand:

proc freq data= have;

tables id * CodedVariable / norow nocol nopercent;

run;

Table with any ID and 2 or more values for the recode are problematic.

Then

proc print data=have;

where id in (<list the id values from above here>);

var id <and list of variables used in the recode> CodedVariable;

run;

That should give an idea of which rule(s) may not have been applied correctly.

Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Re: Make sure every observations has been assigned to a subset

Ready to join fellow brilliant minds for the SAS Hackathon?