BookmarkSubscribeRSS Feed
disguy
Calcite | Level 5

Hello,

I'm new to SAS and after doing a lot of searching, I can't seem to get some simple data operations to work.

Given a dataset, I want to replace certain data entries systematically. My document contains several unwanted entries with the string "DQ" and I want to replace them in the data step with missing values ".". The problem is, I get errors and I suspect it has to do because the "DQ" entries are given for both numerical and categorical variables. I have tried three different approaches, and all of them have yielded errors that my data is invalid still. Note: when I used the 'replace all' function in notepad and ran the data, I had no "invalid data" problems.

To show my approaches, let "categorical" and "numerical" be my variables.

Approach #1:

numerical = .;

replace all var{numerical} where(numerical = "DQ);

categorical = ".";

replace all var{categorical} where(categorical = "DQ");

Approach #2:

if numerical = "DQ" then numerical = .;

if categorical = "DQ" then categorical = ".";

Approach #3:

numerical = tranwrd(numerical, "DQ", .);

categorical=tranwrd(categorical, "DQ", ".");

Is there something fundamentally wrong with my approach? Even though I distinguish between "." and . without quotations - I have not been able to successfully run the data. I run SAS remotely, so I cannot view the data from within the program; the log simply informs me of the errors. Thanks in advance.

5 REPLIES 5
PGStats
Opal | Level 21

Approach 2 is almost it. Try this way:

if LEFT(numerical) = "DQ" then numerical = "";

if LEFT(categorical) = "DQ" then categorical = "";

PG

PG
disguy
Calcite | Level 5

Thanks for the reply. Unfortunately, it did not work (same issues as before).

When the log lists the data values for each variable, it shows that the values have been changed to "." - as desired (and also as with the methods I listed in my original post).

However, for multiple lines the log writes "NOTE: Invalid date for numerical in line XXX XX-XX". It seems to be doing this only for the numerical variable.

EDIT: And if I'm not mistaken, it seems to only be outputting this error for observations where both the numerical and categorical variables have the entry "DQ".

PGStats
Opal | Level 21

If numerical really is a numerical variable and it prints as "DQ" then it must have an associated format that represents some numerical values as "DQ". You could try

if left(vvalue(numerical))="DQ" then call missing(numerical);

If that doesn't work, please post the result of :

proc contents data=yourDatasetName; run;

PG

PG
disguy
Calcite | Level 5

Unfortunately, I am using a remote version of SAS and it really limits what I can do. This being the case, I can't view the results of proc contents. However, I have uploaded the data file to pastebin: http://http://pastebin.com/WLjLdPex

The relevant variables are "force" (the one I referred to as "numerical") and "break" (the one I referred to as "categorical"). My instructions were to replace the "DQ" values with "."

I will post the code I used to access the data remotely:

data test;

infile 'datafile.txt'

firstobs=2

delimiter='09'x

termstr=crlf;

input btch$ glue$ clamp$ overlap$ stick color$ user$ jig caliper force break$ quality;

drop btch user jig;

run;

When attempting to replace the DQ values, I inserted the code between the drop and run statements (if that matters).

Tom
Super User Tom
Super User

So if you try to read the variable FORCE (10th column in your raw data file) as a number then DQ will automatically be converted by SAS as missing value.  So you just need code to convert the 'DQ' in the character variables.

array _c _character_;

do over _c ;

  if _c='DQ' then _c='.';

end

Personally I would set the character variables to blank instead of '.' .

If you want to avoid the error message about invalid data then you could create an informat.  You could create a character informat also.  But here is another trick that will do without informats or looping code.  The idea is to convert the data in the input buffer before reading the line into the variables.

Add these two lines to you current program right before the INPUT statement.

input @;

_infile_ = tranwrd(_infile_,'DQ',' .');

SAS will read the period as missing for numeric with the conversion error messages that you are getting when it sees the DQ.  The character variables all become blank (the dot will disappear).

Example:

data test;

  infile cards firstobs=2 truncover;

  input @;

  put _infile_;

  _infile_ = tranwrd(_infile_,'DQ',' .');

  put _infile_;

  input btch $ glue $ clamp $ overlap $ stick color $ user $ jig caliper force break $ quality;

  put (_all_) (=) /;

cards;

batch glue clamp overlap sticks colour user jig caliper force break quality

CCM2B C C M 2 B N.R 3 20 12.5 w 6

CCM2B C C M 2 B N.R 3 18.6 DQ DQ 0

CCM2B C C M 2 B S.J 3 17.6 7 w 5

run;

CCM2B C C M 2 B N.R 3 20 12.5 w 6

CCM2B C C M 2 B N.R 3 20 12.5 w 6

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=N.R jig=3 caliper=20 force=12.5 break=w quality=6

CCM2B C C M 2 B N.R 3 18.6 DQ DQ 0

CCM2B C C M 2 B N.R 3 18.6  .  . 0

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=N.R jig=3 caliper=18.6 force=. break=  quality=0

CCM2B C C M 2 B S.J 3 17.6 7 w 5

CCM2B C C M 2 B S.J 3 17.6 7 w 5

btch=CCM2B glue=C clamp=C overlap=M stick=2 color=B user=S.J jig=3 caliper=17.6 force=7 break=w quality=5

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2493 views
  • 6 likes
  • 3 in conversation