BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I'm a Sas newbie, and I need some help figuring out how to do something.

I have 2 data sets, with a numeric key variable. I want to do a merge or lookup in in which I take they key variable in data set A, and cross check it against data set B. If the cross check meets 2 specific conditions (the variable in A and B must be within 20,000 of each other, and another variable in data set B must be a positive number), then I will take a value from Data Set B and place it in a new column in Data Set A.

The main issue I am having in doing this is, I don't know how to code an if then statement that looks in Data Set B if I have initialized the step for Data Set A. Can anyone point me in the right direction? How do you point a data step to look for a variable in a different data set than the one you are working in?
7 REPLIES 7
Olivier
Pyrite | Level 9
Hi.
Your problem sounds so much as a SQL query that my first impulse would be to advise you to the SQL procedure... Something that would say :
[pre]
PROC SQL ;
CREATE TABLE yourLib.yourResult AS
SELECT *
FROM yourLib.A, yourLib.B
WHERE ABS(a.x1 - b.x2) < 20000
AND b.z > 0
;
QUIT ;
[/pre]
However, if you feel more confortable with the Data step logic, your problem could be solved using two SET statements in the same Data step. The first one is for reading A one row at a time, and the second one for reading the whole B dataset each time a new observation is loaded from A. This can be done by controlling the second SET statement with a DO loop and a POINT option. POINT directs the SET statement to read only a specified record, instead of the implicit loop that it usually does.
Since, reading my explanations, I hardly understand myself, I think I'd best proceed to an example...
[pre]
DATA yourLib.yourResult ;
SET yourLib.A ;
DO B_row=1 TO B_size ;
SET yourLib.B POINT = B_row NOBS = B_size ;
IF ABS(x1 - x2) < 20000
AND z > 0 THEN OUTPUT ;
END ;
RUN ;
[/pre]
I still think that this solution looks less simple than the first one, especially if variable have the same names in datasets A and B (your have to rename them in the second solution, when SQL provides dataset.variable notation to avoid ambiguous references).
Besides, you could be advised to use more tricky -- yet efficent -- solutions as hash tables. Have a look to the HASH object documentation if you feel sane enough to code that kind of things...

Regards.
Olivier
deleted_user
Not applicable
Thanks Olivier! I'll try this out, the SQL actually seems to make more sense to me. I'll be back with questions if I have any problems! Message was edited by: Amr@BroadInstitute
deleted_user
Not applicable
The SQL worked great, thanks! Just a couple more questions if you don't mind.

With this program, many of the observations in data set B match the conditions set in the program, and it returns all of them. If I want to limit the number of matches arbitrarily, is there a function for that? So, instead of returning each observation in data set B that matches the conditions for each obs in data set A, I want it to return X number of matches from Data Set B for each obs in Data set A.

Second, is there a way to have SAS ask for user input? I do a little perl programming, if you r familiar with it, there are ARGV statements that allow user-defined values for processing in the program. It'd be great if I can do this in SAS because than this program I am writing could apply to other similar tasks and not just this specific one.
Olivier
Pyrite | Level 9
There is no direct way to limit the number of matches in SQL (or at least, not any I can think of). What you can do, if you have a variable with unique values in data set A, is sort the result of the SQL join by this variable (with an extra ORDER BY clause in the SQL query), and then, if you only want ten matches per value :
[pre]
DATA myJoinLtd ;
SET myJoin ;
BY uniqueAvariable ;
IF FIRST.uniqueAvariable THEN howManyMatches = 0 ;
howManyMatches + 1 ; /* increase the counter */
IF howManyMatches < 11 THEN OUTPUT ;
RUN ;
Olivier
Pyrite | Level 9
For the user input, you have several options :
1) if you are using traditionnal SAS environnement, you can check out the %WINDOWS and %DISPLAY syntax to display (ugly) prompt windows and save the values entered in macro variables
2) if you are using SAS Enterprise Guide, you can define parameters (see the OPTIONS menu) that can be used throughout the project, and for which a (very nice) prompt window will be displayed each time the attached task, or query, or code, will be run.
deleted_user
Not applicable
Olivier,

You were so helpful to the last 'newbie' I can't resist asking for some help too. I have 2 datasets: Dataset A has a list of names and dobs from a membership registry. Dataset B has a list of names and dobs from a testing registry (lab results). We want to know how many (not who) of those listed in dataset A are found in Dataset B. (Hopefully no output file that links). Is there a way to just get counts of how many from A were found in B?

cheers,

Maggie
deleted_user
Not applicable
First, thanks again Olivier!

Maggie, I'll take a stab at your question. I think you could modify the code that Olivier gave me to do what you are asking:

PROC SQL ;
CREATE TABLE yourLib.yourResult AS
SELECT *
FROM yourLib.A, yourLib.B
WHERE a.names = b.names
;
QUIT ;

I think that will create a new data set where only the names that are in both tables exists. I know thats not exactly what you wanted, but I don't know how to code a counter into SQL. You could than just output the number of obs in the new table to know how many matching names you have.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 760 views
  • 0 likes
  • 2 in conversation