BookmarkSubscribeRSS Feed
hovliza
Obsidian | Level 7

Hi! For my thesis I have to harmonise data from two different databases, but I don't need all data from both databases. My question is what is the easiest way to do this? I don't know how to pick one variable from the database X and the same variable from database Y and put it in new database Z? How can I do that, without putting all the variables from X and Y in database Z? The databases have different names for the same variables (e.g. database X = height and database Y = length). Here are some syntaxes I used, but these give errors.. 

 

 

*database X same as database Y*;
DATA database X;
SET database X;
LABEL person_id = respno;
RUN;

 

*database Y same as database X*;
DATA database Y;
SET database Y;
LABEL length = height;
RUN;

 

*Interleaving*;
DATA database Z;
SET database X database Y;
by height respno;
RUN;

 

*Concatenating*;
DATA database Z;
SET database X database Y;
RUN;

 

*One-to-one reading and one-to-one merging*;
DATA database Z;
SET database X;
SET database Y;
RUN;

 

DATA database Z;
MERGE database X database Y;
BY respno;
RUN;

And I need to put a lot of variables from X and Y in database Z. Is there an easy way to change the names of the variables which are the same in databases X and Y (but have a different name, as already mentioned with height) and to put these variables in database Z? I hope my question is clear and someone can help me. 

Thanks! Oh and i use SAS version 9.3 

13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What do you mean by database?  Is the data stored in a database like Oracle or Postgre or something like that, or is the data in SAS?

How much SAS knowledge do you have, as the code you give is wrong on a few levels, might be worth running through the SAS video help to get some basics.  As you haven't given me anything to work with here (i.e. example data or what you want to do with it) I will just show some general examples - assumes the data is in SAS:

libname mylib "c:\path_to_data";

The above creates a library reference pointing to a physical location on your computer/network.  This is how SAS knows where to look for data.  We can then use this too set (put one under the other) datasets together (say there is dataset a and b):

data want;
  set mylib.a (keep=vara varb)
      mylib.b (keep=vara varb);
run;

You will see I only keep the variables I want.  This requires both sets of variables are the same type and length.

To merge data you first need to ensure both are sorted, then merge by key variables (take the above example, but merge on id and keep only varc from b):

data want;
  merge mylib.a (keep=id vara varb)
        mylib.b (keep=id varc);
by id; run;

These are quite basic and fundamental things which is why I suggest looking at the video tutorials and learning Base SAS a bit more.

hovliza
Obsidian | Level 7

Thanks for your reply! I understand what you're saying. The databases I talk about are databases in SAS format. And I have (hopefully) basic knowledge of SAS. That's why I think I'm missing something here.. When I do this:

DATA m.combined;
SET m.database X  (keep=height respo)
    m.database Y  (keep=height respo);
RUN;

I get these errors: 
ERROR: The variable height in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: The variable respo in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: The variable height in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: The variable respo in the DROP, KEEP, or RENAME list has never been referenced.

 

What does this mean? I look for it on the internet, but can't solve it for my own data. Database X and Y both have the variables height and respo in them.

error_prone
Barite | Level 11

If "m" is the name of the library assigned to the directory in which the datasets "X" and "Y" are stored, than what is "database"?

 

Right now you try to combine the datasets m.database, work.x, m.database and work.y - maybe not what you want to do.

hovliza
Obsidian | Level 7

Yes, I'm sorry, I mean dataset! And m is the location where SAS needs to look for both datasets X and Y.

Let's make it more clear, I want to have a dataset Z in SAS like this:

 

RespnoHeightLengthOthervariable1Othervariable2Othervariable3
All respno from datasetxAll height from datasetxAll length from datasetx etcetcetc
All respno from datasetyAll height from datasety
All length from datasety
 etc  etc  etc 
      

 

When I do this:

DATA m.Z;
SET m.X 	(keep=respno height)
    m.Y         (keep=respno height);
RUN;

I only get the respno from dataset X and no respno from dataset Y or height from both datasets..

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please use the training facilities available, you are missing some fundamental knowledge.  This code:

SET m.dataset X 	(keep=respno height)

Is invalid.  To reference a dataset you first create a library reference - refer to my post.  This library reference is the first part preceeding the .

 

The second part is the physical name of the dataset.  So, if on your network/computer you have a sas dataset called X, which will look something like: x.sas7bdat, then you reference this by first creating a library reference to the network/computer location and then using the code:

set m.x (keep=respno height)

You will see there is not dataset or database word in there, simply <library reference>.<datasetname>

 

This is a fundamental concept which is learnt in episode 1 of SAS training.

hovliza
Obsidian | Level 7

Yes, I understand what you are saying. I already made a library m:

Libname m 'M:\Thesis data';
RUN;

Dataset X and dataset Y are fictitious, because I can't give too much information about my thesis. But the datasets are in the map thesis data. In real these datasets have a different name. So I hope it makes it more clear and someone can help with my questions.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Right, so this code:

Libname m 'M:\Thesis data';

Is fine. Then we add a datastep which sets two datasets within that library together:

data want;
  set m.dataset1 (keep=vara varb)
      m.dataset2 (keep=vara varb);
run;

This will set dataset2 under dataset1 in the output dataset called want.  It will select only vara (made up as you wont provide any test data) and varb from each dataset.  It assumes dataset1 and 2 exist in that library, both have the same two variables which are the same.  This will work.  You could also do:

data want (keep=vara varb);
  set m.dataset1 m.dataset2;
run;

So what is the question regarding this? 

You second question was variables have different names, if so then you would use a rename:

data want;
  set m.dataset1 (keep=vara height)
      m.dataset2 (keep=vara height rename=(length=height));
run;

 

To add, this code:

data m.datasetZ;
  set m.datasetX (keep=respno height)
      m.datasetY (keep=respno height);
RUN;

 Will take all the data from datasetX keeping only respno and height.  Then it will take all the data from datasetY keeping only respno and height.  Then it will put the data from the datasetY under datasetX.  So you will see that data further down the table.

Jeremy_Browne
Fluorite | Level 6

HI Hovliza.  As an aside, like RW9 I would recommend some of the SAS videos, e-Learning (PG1) and popular post “Maxims of Maximally Efficient SAS Programmers” by KurtBremser – “Learning to read the documentation will enhance your problem-solving skills by orders of magnitude.”  From the ERROR: The variable respo in the DROP, KEEP, or RENAME list has never been referenced, your issue may be with variables height in X and length in Y if you want to concatenate these as one variable in the combined dataset you need to use the RENAME data set option (http://documentation.sas.com/?docsetId=allprodslang&docsetTarget=syntaxByType-datasetOption.htm&docs... ) i.e. SET m.y (RENAME=(length=height));

hovliza
Obsidian | Level 7

Yes, my formulation is not that good..

Libname m 'M:\Thesis data';
RUN;

*X same as Y*;
DATA m.X;
SET m.X;
RENAME person_id = respno;
RUN;

*Y same as X*;
DATA m.Y;
SET m.Y;
RENAME length = height;
RUN;

DATA m.Z;
SET m.Y    (keep=respno height)
    m.X	   (keep=respno height);
RUN;

I get a table with respno for both dataset X and Y and height only for Y. I get the error; 

ERROR: The variable height in the DROP, KEEP, or RENAME list has never been referenced.

But height is in dataset X..

Jeremy_Browne
Fluorite | Level 6

Read the log.  Look at the output data.  Look at proc contents data=m.X; Look at the RENAME documentation – an option in the set statement i.e. in parentheses (RENAME=(old-name-1=new-name-1));

Reeza
Super User

Don’t code in the style you’re currently using, where the data set name in the SET is the same as the DATA statement. It makes mistakes really hard to find and you run the risk of destroying your data. 

 

Data have;
Set have;

 

I suspect thats already happened. I suggest recreating your original data sets, then retrying this SET statement with the RENAME. 

And read the chapter in the documentation on Combining Data sets. 

http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p15jvywi5avt3cn1bee8r6c33ux1.htm&docsetVer...

Reeza
Super User
And there’s a difference between the terms database and data set with respect to SAS. They can be one and the same but typically databases have multiple datasets, not a single one.
error_prone
Barite | Level 11

Read the log, the notes contain the number of observations read from each dataset.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1714 views
  • 1 like
  • 5 in conversation