BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Achieng
Quartz | Level 8

Dear All, 

Thank you very much for all the help I have been getting. I am grateful.

I am new SAS user.

My problem...

 

I am trying to merge 4 sets of data.

 

PROC IMPORT OUT= DATA_2005
DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\Fa_data.sav"
DBMS=SPSS REPLACE;
RUN;

PROC IMPORT OUT= PLASMAFA_2015
DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PLASMAFA_2015PURE.sav"
DBMS=SPSS REPLACE;
RUN;


PROC IMPORT OUT= ANTHRO_2015
DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav"
DBMS=SPSS REPLACE;
RUN;

PROC IMPORT OUT= INSULIN_2015
DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PURE2015_e411Insulin.sav"
DBMS=SPSS REPLACE;
RUN;

 

I have successfully imported the data.

 

data DATA_2005;set DATA_2005;
fa_data=1;
run;


data PLASMAFA_2015;set PLASMAFA_2015;
PLASMAFA_2015PURE=1;
format _all_;
run;

data ANTHRO_2015;set ANTHRO_2015;
ANTHRO2015=1;
format _all_;
run;

data INSULIN_2015;set INSULIN_2015;
PURE2015_e411Insulin=1;
format _all_;

And successfully created variable for merging 

 

The problem comes when sorting the data.\

When I run the sort data 

 

proc sort data=DATA_2005;
by id2005;
run;quit;

proc sort data=PLASMAFA_2015;
by id2005;
run;quit;

proc sort data=ANTHRO_2015;
by id2005;
run;quit;

proc sort data=INSULIN_2015;
by id2005;
run;quit;

 

 The ANTHRO_2015 variables gets deleted and so there are no variables. There are no duplicates.

 

I am able to merge all the data, but the last data set does not have the ANTHRO_variables.

 

Please help.

 

I am working with longitudinal data,

 

Worried PhD student 

Alice Ojwang 

South Africa 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26
%data2datastep (Anthrotrial_2015, work, c:\temp.txt, 2);

You may want to run through some basics of SAS programming before going further. 

As for your issue, its because you have variables which are the same name across the datasets, you can only have one in the output dataset so SAS takes the first one. 

View solution in original post

17 REPLIES 17
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Firstly, avoid coding in upcase, or mixed, it makes code hard to read.  Also you can post code using the code window - {i} above post, this retains formatting so indentations show correctly.

Next, we can tell nothing from that code.  What does the data look like?  Proc sort without nodupkey or where does not remove observations.  What is likely is that there has been an error and the data has not been written.  What does the log say?  Also note you don't need to put quit; after sorts

Achieng
Quartz | Level 8

Dear RW9,

Thank you for your message,

I will make the changes while coding in the near future.

 

Identified this problem when I was trying to use the anthropometric variables to analyse data. When I checked the Merged file, it did not have the Anthro variables. I investigated the problem.  I realised that When I import the files, the anthro2015  data was imported when I

1. I import the files, the anthro2015  data was imported,

 

when I create  a variable in preparation for merging, the data wa still there, when I run the sort command, then the when

I run the sort command, then the variables were deleted. i.e the data file had no variables. I went step by step checking the dat files. I just realised it is only deleted after the sort command.

 

How does the data look like?

 

The datasets have been cleaned and checked for duplicates before merging.

 

Thank you.

\looking forward to your help.

 

 

 Data _2005to2015;
57
58   merge DATA_2005 PLASMAFA_2015 ANTHRO_2015 INSULIN_2015;
59   by id2005;
60   if fa_data=. then delete;
61   IF PLASMAFA_2015PURE=. then delete;
62
63   IF ANTHRO2015_PURE=. then delete;
64   IF PLASMAFA_2015PURE=. then delete;
65   IF PURE2015_e411Insulin=. then delete;
66   run;

NOTE: Variable ANTHRO2015_PURE is uninitialized.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 711 observations read from the data set WORK.DATA_2005.
NOTE: There were 853 observations read from the data set WORK.PLASMAFA_2015.
NOTE: There were 2121 observations read from the data set WORK.ANTHRO_2015.
NOTE: There were 2080 observations read from the data set WORK.INSULIN_2015.
NOTE: The data set WORK._2005TO2015 has 0 observations and 285 variables.
NOTE: DATA statement used (Total process time):
      real time           0.25 seconds
      cpu time            0.06 seconds


 

 

PROC IMPORT OUT= DATA_2005
			DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\Fa_data.sav"
            DBMS=SPSS REPLACE;
RUN;

PROC IMPORT OUT= PLASMAFA_2015
			DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PLASMAFA_2015PURE.sav"
            DBMS=SPSS REPLACE;
RUN;


PROC IMPORT OUT= ANTHRO_2015
			DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav"
            DBMS=SPSS REPLACE;
RUN;

PROC IMPORT OUT= INSULIN_2015
			DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PURE2015_e411Insulin.sav"
            DBMS=SPSS REPLACE;
RUN;

/*/* create a variable for 2005 to 2015 dataset f_2005;*/use import .sav name;*/;


data DATA_2005;set DATA_2005;
fa_data=1;
run;


data PLASMAFA_2015;set PLASMAFA_2015;
PLASMAFA_2015PURE=1;
format _all_;
run;

data ANTHRO_2015;set ANTHRO_2015;
ANTHRO2015=1;
format _all_;
run;

data INSULIN_2015;set INSULIN_2015;
PURE2015_e411Insulin=1;
format _all_;

run;


/**************************************sort data */;

proc sort data=DATA_2005;
by id2005;
run;quit;

proc sort data=PLASMAFA_2015;
by id2005;
run;quit;

proc sort data=ANTHRO_2015;
by id2005;
run;quit;

proc sort data=INSULIN_2015;
by id2005;
run;quit;

*****************************************************************************************************************
/*Merge all data */;

Data _2005to2015;

merge DATA_2005 PLASMAFA_2015 ANTHRO_2015 INSULIN_2015;
by id2005; 
if fa_data=. then delete; 
IF PLASMAFA_2015PURE=. then delete; 

IF ANTHRO2015_PURE=. then delete; 
IF PLASMAFA_2015PURE=. then delete; 
IF PURE2015_e411Insulin=. then delete; 
run;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, the log clearly shows your problem:

NOTE: Variable ANTHRO2015_PURE is uninitialized.

This means there is no variable on any of the input datasets, therefore when you do if that variable is . - which it always will be as SAS creates a default empty variable if it doesn't exist, then the data is removed, which is true in every case. 

So the question back to you is where do you believe this variable exists?  Its not created in the code you provide, and due to the error, I can't imagine it exists in the data you have imported.  Look at the dataset contents and you will see this variable does not exist.

Achieng
Quartz | Level 8

 

Dear Superuser,

Thank you for your response,

however, being a new user, I am still not getting anywhere.

I have tried again, but getting the same problem exists.

 

Please note the three steps before merging data. There is no indication of a problem yet the anthro data is not being imported. Thanks for your patience, please see the codes and help.

 

In your response: .......therefore when you do if that variable is . - which it always will be as SAS creates a default empty variable if it doesn't exist, then the data is removed, which is true in every case.

 

which is true, but this data set has all variables when I import the data, create the variable, except after I run the sort command, then the variables are deleted except the headings. it is cleared and there are no variables, but it does not indicate anywhere in the logs, I only realize that now because I proc print the data variables. Please see the logs 



PROC IMPORT OUT= DATA_2005 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\Fa_data.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= PLASMAFA_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PLASMAFA_2015PURE.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= ANTHRO_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= INSULIN_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PURE2015_e411Insulin.sav" DBMS=SPSS REPLACE; RUN; /*/* create a variable for 2005 to 2015 dataset f_2005;*/use import .sav name;*/; data DATA_2005;set Data_2005; fa_data=1; run; data Plasmafa_2015;set Plasmafa_2015; Plasmafa_2015PURE=1; format _all_; run; data Anthro_2015;set Anthro_2015; Anthro2015=1; format _all_; run; data Insulin_2015;set Insulin_2015; PURE2015_e411Insulin=1; format _all_; run; /**************************************sort data */; proc sort data=Data_2005; by id2005; run; proc sort data=Plasmafa_2015; by id2005; run; proc sort data=Anthro_2015; by id2005; run; proc sort data=Insulin_2015; by id2005; run;quit; ***************************************************************************************************************** /*Merge all data */; Data _2005to2015; merge DATA_2005 PLASMAFA_2015 ANTHRO_2015 INSULIN_2015; by id2005; if fa_data=. then delete; IF PLASMAFA_2015PURE=. then delete; IF ANTHRO2015=. then delete; IF PLASMAFA_2015PURE=. then delete; IF PURE2015_e411Insulin=. then delete; run; PROC IMPORT OUT= DATA_2005 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\Fa_data.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= PLASMAFA_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PLASMAFA_2015PURE.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= ANTHRO_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav" DBMS=SPSS REPLACE; RUN; PROC IMPORT OUT= INSULIN_2015 DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\PURE2015_e411Insulin.sav" DBMS=SPSS REPLACE; RUN; /*/* create a variable for 2005 to 2015 dataset f_2005;*/use import .sav name;*/; data DATA_2005;set Data_2005; fa_data=1; run; data Plasmafa_2015;set Plasmafa_2015; Plasmafa_2015PURE=1; format _all_; run; data Anthro_2015;set Anthro_2015; Anthro2015=1; format _all_; run; data Insulin_2015;set Insulin_2015; PURE2015_e411Insulin=1; format _all_; run; /**************************************sort data */; proc sort data=Data_2005; by id2005; run; proc sort data=Plasmafa_2015; by id2005; run; proc sort data=Anthro_2015; by id2005; run; proc sort data=Insulin_2015; by id2005; run;quit; ***************************************************************************************************************** /*Merge all data */; Data _2005to2015; merge DATA_2005 PLASMAFA_2015 ANTHRO_2015 INSULIN_2015; by id2005; if fa_data=. then delete; IF PLASMAFA_2015PURE=. then delete; IF ANTHRO2015=. then delete; IF PLASMAFA_2015PURE=. then delete; IF PURE2015_e411Insulin=. then delete; run;

.

 

I am sorry for the back and forth, but I know this is where I will get the help.

I do appreciate your time.

 

Kind regards

 

 

Achieng 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I do not see any logs.  Please strip down the question to the actual point.  Show the code you believe is the problem, and the log with that code run, and if possible show what the dataset looks like (structure from a proc contents).  I repeat that proc sort does not delete data without a nodupkey or where clause, the code you present does not include this so either that is not the code causing the issue or you have issues in your log.

Achieng
Quartz | Level 8

Dear Superuser,

Thank you once more.

I actually do not understand where the problem is, except that after I merge the data, the Anthro data is not merging with the rest. I have tried everything, I have looked at the data set. Attached is the contents.

Regards

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Lets start again.  Don't post things as Excel files.  Get your data into SAS, then follow this post:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

To create datastep versions of your data (only need a couple of rows of each one as an example).  Post that text datastep into a code window - it is the {i} above the post area.  Once I have something I can actually look at then we can see.

Achieng
Quartz | Level 8

Dears Superuser,

Thank you for your support.

 

Well, I have taken 3 hours trying to get you the file. However, I have been unable to generate the data using the codes you gave me. Please don't be upset. I am learning. Is there another way???

%data2datastep(Anthro_2015, DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRODATA_2015.sav"
            DBMS=SPSS REPLACE, ANTHRODATA_2015, 5);

 

However, I managed to solve the problem.

1. I changed the names of the data files fro Anthrodata set, so they are different i.e 

ANTHRODATA_2015, 

2. Then I went through all the processes of merging data....i.e ( import, create a variable, sort ) for all the data sets.

However, before. I run the merge command, I repeated the same process for the Anthro data ( import, create, sort)  that was refusing to merge, 

3. Finally, it merged. I was so excited, that I don't think I will be putting off my computer, less it does not work tomorrow. 🙂 

 

However, I have not discovered the real problem and If I can get at least part of the data set to you, then you can at least help me understand.

I really do appreciate your time, energy and contribution.

May Jehovah God bless you abundantly.

Yours sincerely 

Alice

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You would need to run your proc import, then use the dataset name in the macro call, something like:

PROC IMPORT OUT= ANTHRO_2015
			DATAFILE= "C:\Users\Ojwang AA\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav"
            DBMS=SPSS REPLACE;
RUN;

%data2datastep (Anthro_2015, work, c:\temp.txt, 2);

So the proc import gets the data from your external file.  The macro will take this dataset and create a text file (I have called it c:\temp.txt - i.e. a file temp.txt in c: drive, you may need to change this).  The text in the text file can be copied and pasted into a code window here using the {i} button.

 

From what you have said, yes, each of the variables in the contributing datasets should be unique if they are not used as merged id's, otherwise SAS will take the first occurence as the value.

Achieng
Quartz | Level 8

Hello, Superuser,

Thank you for your commitment, and stay here till we solve this.

 

I successfully downloaded the macro.

PROC IMPORT OUT= ANTHROTRIAL_2015
			DATAFILE= "C:\Users\NWUUser\Documents\DOCTORAL WORK\2017 and beyond\2nd PUBLICATION\DATA FOR PAPER 2\ANTHRO2015.sav"
            DBMS=SPSS REPLACE;
RUN;

I successfully imported the file using the macro window

 

However, I am still unable to create the temporary text file.

 

This is the error I am getting 

%macro %data2datastep (Anthrotrial_2015, work, c:\temp.txt, 2);
run;
 

obviously you can see the amateur at work 😉

 

Regards

RW9
Diamond | Level 26 RW9
Diamond | Level 26
%data2datastep (Anthrotrial_2015, work, c:\temp.txt, 2);

You may want to run through some basics of SAS programming before going further. 

As for your issue, its because you have variables which are the same name across the datasets, you can only have one in the output dataset so SAS takes the first one. 

Achieng
Quartz | Level 8

Thank you Superuser,

I will learn as we go along. I am actually just trying to learn on my own. I am PhD student and I am required to use SAS for my data analysis. I have basically trained myself and still training, Obviously, I am jumping steps, as you can see. Thank you very much.

 

I think the problem was truly the similar names.

 

Please feel free to recommend some links I can use for learning.

 

Otherwise, thank you very much for staying committed. This was my first post and I am very happy with the help you have provided.

 

I shall be back here shortly with questions about logistic regression modelling.

 

Very kind regards

Achieng 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

No probs.  You can find a lot of help online, main part will be the documentation 

http://support.sas.com/documentation/onlinedoc/base/

 

Then there is SAS provided video tutorials;

http://video.sas.com/

 

For graphs, this blog is essential reading, lots of examples of pretty much any graph;

http://blogs.sas.com/content/graphicallyspeaking/

 

And finally using the search on these forums, there is lots of information here.

 

Achieng
Quartz | Level 8

Thank you Very much.

 

Very Kind regards

Achieng 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 3581 views
  • 2 likes
  • 2 in conversation