Help using Base SAS procedures

Replacing Data Entries

Reply
Occasional Contributor
Posts: 13

Replacing Data Entries

Hello,

I'm new to SAS and after doing a lot of searching, I can't seem to get some simple data operations to work.

Given a dataset, I want to replace certain data entries systematically. My document contains several unwanted entries with the string "DQ" and I want to replace them in the data step with missing values ".". The problem is, I get errors and I suspect it has to do because the "DQ" entries are given for both numerical and categorical variables. I have tried three different approaches, and all of them have yielded errors that my data is invalid still. Note: when I used the 'replace all' function in notepad and ran the data, I had no "invalid data" problems.

To show my approaches, let "categorical" and "numerical" be my variables.

Approach #1:

numerical = .;

replace all var{numerical} where(numerical = "DQ);

categorical = ".";

replace all var{categorical} where(categorical = "DQ");

Approach #2:

if numerical = "DQ" then numerical = .;

if categorical = "DQ" then categorical = ".";

Approach #3:

numerical = tranwrd(numerical, "DQ", .);

categorical=tranwrd(categorical, "DQ", ".");

Is there something fundamentally wrong with my approach? Even though I distinguish between "." and . without quotations - I have not been able to successfully run the data. I run SAS remotely, so I cannot view the data from within the program; the log simply informs me of the errors. Thanks in advance.

Respected Advisor
Posts: 4,932

Re: Replacing Data Entries

Approach 2 is almost it. Try this way:

if LEFT(numerical) = "DQ" then numerical = "";

if LEFT(categorical) = "DQ" then categorical = "";

PG

PG
Occasional Contributor
Posts: 13

Re: Replacing Data Entries

Thanks for the reply. Unfortunately, it did not work (same issues as before).

When the log lists the data values for each variable, it shows that the values have been changed to "." - as desired (and also as with the methods I listed in my original post).

However, for multiple lines the log writes "NOTE: Invalid date for numerical in line XXX XX-XX". It seems to be doing this only for the numerical variable.

EDIT: And if I'm not mistaken, it seems to only be outputting this error for observations where both the numerical and categorical variables have the entry "DQ".

Respected Advisor
Posts: 4,932

Re: Replacing Data Entries

If numerical really is a numerical variable and it prints as "DQ" then it must have an associated format that represents some numerical values as "DQ". You could try

if left(vvalue(numerical))="DQ" then call missing(numerical);

If that doesn't work, please post the result of :

proc contents data=yourDatasetName; run;

PG

PG
Occasional Contributor
Posts: 13

Re: Replacing Data Entries

Unfortunately, I am using a remote version of SAS and it really limits what I can do. This being the case, I can't view the results of proc contents. However, I have uploaded the data file to pastebin: http://http://pastebin.com/WLjLdPex

The relevant variables are "force" (the one I referred to as "numerical") and "break" (the one I referred to as "categorical"). My instructions were to replace the "DQ" values with "."

I will post the code I used to access the data remotely:

data test;

infile 'datafile.txt'

firstobs=2

delimiter='09'x

termstr=crlf;

input btch$ glue$ clamp$ overlap$ stick color$ user$ jig caliper force break$ quality;

drop btch user jig;

run;

When attempting to replace the DQ values, I inserted the code between the drop and run statements (if that matters).

Super User
Super User
Posts: 7,076

Re: Replacing Data Entries

So if you try to read the variable FORCE (10th column in your raw data file) as a number then DQ will automatically be converted by SAS as missing value.  So you just need code to convert the 'DQ' in the character variables.

array _c _character_;

do over _c ;

  if _c='DQ' then _c='.';

end

Personally I would set the character variables to blank instead of '.' .

If you want to avoid the error message about invalid data then you could create an informat.  You could create a character informat also.  But here is another trick that will do without informats or looping code.  The idea is to convert the data in the input buffer before reading the line into the variables.

Add these two lines to you current program right before the INPUT statement.

input @;

_infile_ = tranwrd(_infile_,'DQ',' .');

SAS will read the period as missing for numeric with the conversion error messages that you are getting when it sees the DQ.  The character variables all become blank (the dot will disappear).

Example:

data test;

  infile cards firstobs=2 truncover;

  input @;

  put _infile_;

  _infile_ = tranwrd(_infile_,'DQ',' .');

  put _infile_;

  input btch $ glue $ clamp $ overlap $ stick color $ user $ jig caliper force break $ quality;

  put (_all_) (=) /;

cards;

batch glue clamp overlap sticks colour user jig caliper force break quality

CCM2B C C M 2 B N.R 3 20 12.5 w 6

CCM2B C C M 2 B N.R 3 18.6 DQ DQ 0

CCM2B C C M 2 B S.J 3 17.6 7 w 5

run;

CCM2B C C M 2 B N.R 3 20 12.5 w 6

CCM2B C C M 2 B N.R 3 20 12.5 w 6

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=N.R jig=3 caliper=20 force=12.5 break=w quality=6

CCM2B C C M 2 B N.R 3 18.6 DQ DQ 0

CCM2B C C M 2 B N.R 3 18.6  .  . 0

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=N.R jig=3 caliper=18.6 force=. break=  quality=0

CCM2B C C M 2 B S.J 3 17.6 7 w 5

CCM2B C C M 2 B S.J 3 17.6 7 w 5

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=S.J jig=3 caliper=17.6 force=7 break=w quality=5

Ask a Question
Discussion stats
  • 5 replies
  • 365 views
  • 6 likes
  • 3 in conversation