BookmarkSubscribeRSS Feed
echo991
Fluorite | Level 6

Hi,

I have a questions here:

 

You have just written code that assigns each patient in your data set to one of four mutually exclusive subsetsHow would you check to make sure that every patient has been assigned to a subset and none are in more than one?  Knowing that every patient should fall into one of these categories, how would you handle those that don't?

 

Anyone has know how to solve this? Even some idea are okay. 

Many thanks,

Chen

6 REPLIES 6
kannand
Lapis Lazuli | Level 10

One simple way is to look at the log to find out if the counts from the parent dataset match with the sum of the subset datasets, that is if the subsets themselves are datasets.

 

Another easy was is that if you are using a variable that identifies where the subset records are written to, you may do a frequency of that variable and tally them up with the subsets.

 

Third way I can think of is that if there is a key, you can write a merge to see where each record has ended up by merging with the key.

 

You could possibly write a proc compare but I would guess one of the above options would have already displayed the information you are looking for...

 

Good Luck...!!!

 

Kannan Deivasigamani
PGStats
Opal | Level 21

To get a list of the potentially problematic cases, try

 

proc sql;
select patientID, Subset
from myData
group by patientID
having count(distinct Subset) ne 1;
quit;

 How to correct those cases will depend on the meaning of those subsets. You could do it by hand if the number is small or have to refine your assignment procedure if there are too many.

PG
echo991
Fluorite | Level 6
Hi PG,
I can't understand your writing totally, what does select subset mean?
If I have a dataset named myData, and all patients id in myData has set in to one of four mutually exclusive subsets (A, B, C, D), how could I make sure every id has in one of four subsets, no duplicate and no obsence?
Could you help me more, please?

Many thanks,
Chen
PGStats
Opal | Level 21

My code supposes that your dataset myData contains two variables. The variable patientID identifies every patient uniquely. The variable Subset would take values A, B, C, or D. The SQL query will print the list of records of patients and their subsets only for patients with more than one subset, or no subset at all (missing subset value).

Instead of printing the records, you may create a new dataset with the troublesome records with:

 

proc sql;
create table myTroubleData as
select *
from myData
group by patientID
having count(distinct subset) ne 1;
quit;
PG
Kurt_Bremser
Super User
if (condition one) then output dataset1;
else if (condition two) then output dataset2;
else if (condition three) then output dataset3;
else if (condition four) then output dataset4;
else abort;
ballardw
Super User

Not effecient by any means but relatively easy to understand:

 

proc freq data= have;

tables id * CodedVariable / norow nocol nopercent;

run;

 

Table with any ID and 2 or more values for the recode are problematic.

Then

proc print data=have;

where id in (<list the id values from above here>);

var id <and list of variables used in the recode> CodedVariable;

run;

That should give an idea of which rule(s) may not have been applied correctly.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1243 views
  • 0 likes
  • 5 in conversation