07-22-2016 09:42 AM
I have a dataset, with 130000 obs.
I need to check whethr those match any of my reference dataset, if any one of these five variables, the value matchs one of the values of the reference dataset (around 5000 obs), which has three categories linked to those values, "S', "M', or "l', if not matched, it is "normal"
I can use merge one by one, so I need to do five merge
any other option to do that?
07-22-2016 10:09 AM
there are five diagnosis variables, that I need to check those values of these five var, whether the value matches the reference dataset, which includes 5000 obs, and 3 categories
I can do five matches, and see whether any of them matches, if match, they will be considered as there 3 categories
if none of these 5 matches, they will be considered as another category
07-22-2016 10:17 AM
Does the following answers your problem (assuming the reference dataset contains a column "category")
data want; merge mydataset (in=a) reference (in=b); by var1 var2 var3 var4 var5; if a and not b then category="other"; if a; run;
07-22-2016 10:34 AM
07-22-2016 10:26 AM
There are two common methods, that would first process the 5,000 records.
1. Turn the 5,000 records into a format.
2. Turn the 5,000 records into a hash table.
Either way, you would then go through the 5 variables in an array, to see if they match the new form of the 5,000 records.
Are you comfortable with either technique (creating a format from a SAS data set, or creating a hash table)?
07-22-2016 10:28 AM
Create a format using your 5000 dataset, make sure to include an Other category.
Then in apply the format and check for the Other category.
A merge wont work as is because you want to check each value against all other values.
Another method is is to make your wide dataset long and then use something like the following:
create table matches as
select id, diag
from long where diag not in (select diag from lookup);
You can merge the matches dataset with your original wide dataset if you wanted.
07-22-2016 10:32 AM
Another option is temporary arrays.
Edit: The long method proposed wouldn't work. I missed part about 3 categories. You can merge the long data with the lookup data to find the category. Do you have the possibility of multiple matches? Ie can a record belong to multiple categories? I'd recommend a format though. It's a common method in analyzing clinical data and its come up as a solution to several of your recent questions so probably worth learning.
07-22-2016 10:52 AM
This paper may be helpful (not for this question though)
The link at the end includes the code.