02-11-2014 02:49 PM
Perhaps my question is totally a newbie thing but I am hitting a wall with simple stuff.
I did PROC IMPORT to import an access data file successfully. What I need to do is to fix my data because under one of the variables called country some observations says BRAZIL and BRAZIL1 when in fact they are both the same thing.
Can someone walk me through how i can create a new variable so I can fix this and merge them? I tried if then statements but it would not work.
02-11-2014 03:15 PM
You say variable called country and observations with Brazil/Brazil1, is that correct?
If that's correct, did it import incorrectly or is your data incorrect from the source?
Or did you mean two variables, one called Brazil and one called Brazil1?
Regardless, how would you like the output to look like.
02-11-2014 03:40 PM
Just to clarify. This is an example of how my PROC Import output looks like.
COUNTRY SEX MARITAL STATUS
USA F S
PERU M S
BRAZIL1 M S
BRAZIL F M
USA M S
BRAZIL1 M S
Brazil1 and Brazil are one and the same. The reason why some are called brazil1 and some are called brazil is because two different people collected the data and there was no uniformity. My question is how to convert all the BRAZIL1 into BRAZIL to make the results streamlined?
02-11-2014 03:48 PM
So really this is about cleaning data. This can be done via a data step and using the compress function to remove any numbers from Brazil for example.
Here's a link to the documentation for the Compress function.
Country_Clean=compress(Country, , 'ka');
02-11-2014 05:02 PM
What was the original data file type or source?
I ask because if the original data was a text file such as tab or comma delimited then Proc Import writes data step code to read the data. You could then modify the generated to incorporate your fixes. If you are going to do this with a number of files I would recommend considering this approach as it will allow you to set such things as character variable lengths, informats and formats you want instead of defaults and routine data manipulation such as this.