BookmarkSubscribeRSS Feed
Bal23
Lapis Lazuli | Level 10

I have a dataset, with 130000 obs.

I need to check whethr those match any of my reference dataset, if any one of these five variables, the value matchs one of the values of the reference dataset (around 5000 obs), which has three categories linked to those values, "S', "M', or "l', if not matched, it is "normal"

I can use merge one by one, so I need to do five merge

any other option to do that?

thanks.

8 REPLIES 8
Reeza
Super User

Your question is unclear. 

Bal23
Lapis Lazuli | Level 10

there are five diagnosis variables, that I need to check those values of these five var, whether the value matches the reference dataset, which includes 5000 obs, and 3 categories

 

I can do five matches, and see whether any of them matches, if match, they will be considered as there 3 categories

if none of these 5 matches, they will be considered as another category

gamotte
Rhodochrosite | Level 12

Hello,

 

Does the following answers your problem (assuming the reference dataset contains a column "category")

 

data want;
merge mydataset (in=a) reference (in=b);
by var1 var2 var3 var4 var5;
if a and not b then category="other";
if a;
run;
gamotte
Rhodochrosite | Level 12
I did not read correctly your description. I thought there had to be a match for all 5 variables. So this won't do what you want.
Astounding
PROC Star

There are two common methods, that would first process the 5,000 records.

 

1. Turn the 5,000 records into a format.  

 

2. Turn the 5,000 records into a hash table.

 

Either way, you would then go through the 5 variables in an array, to see if they match the new form of the 5,000 records.

 

Are you comfortable with either technique (creating a format from a SAS data set, or creating a hash table)?

Reeza
Super User

Create a format using your 5000 dataset, make sure to include an Other category. 

 

Then in apply the format and check for the Other category. 

 

A merge wont work as is because you want to check each value against all other values. 

 

Another method is is to make your wide dataset long and then use something like the following:

 

proc sql;

create table matches as

select id, diag

from long where diag not in (select diag from lookup);

quit. 

 

You can merge the matches dataset with your original wide dataset if you wanted. 

Reeza
Super User

Another option is temporary arrays. 

 

https://gist.github.com/statgeek/f052b5223fecca066b1f

 

Edit: The long method proposed wouldn't work. I missed part about 3 categories. You can merge the long data with the lookup data to find the category. Do you have the possibility of multiple matches? Ie can a record belong to multiple categories? I'd recommend a format though. It's a common method in analyzing clinical data and its come up as a solution to several of your recent questions so probably worth learning. 

Reeza
Super User

This paper may be helpful (not for this question though)

 

http://www.lexjansen.com/wuss/2014/71_Final_Paper_PDF.pdf

 

The link at the end includes the code. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2321 views
  • 0 likes
  • 4 in conversation