SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Replacing multiple variable values for multiple variables

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 109
Accepted Solution

Replacing multiple variable values for multiple variables

Hello,

I need help with a data management step, please see following:

I need to replace observation values (formatted as characters) in a very large dataset where the observation values can be in multiple variables, all variables have the prefix DX. There are about 300 different values that will need to be replaced, so listing them in the fewest lines of code would be ideal. There is no discernible pattern in the values that need to be replaced. Also the dataset already exists, so I will not be inputting these data, only trying to change them in a data step. I would also like the variable names to remain the same and the order of the variables to remain the same after changing values.

So here is an example, I would want the following observation values

changed:   959.1 should be 959.10

           959.2 should be 959.20

           959.9 should be 959.90

           800   should be 800.00

           809   should be 809.0;

Example set;

data test;

      input DX1$ DX2$ DX3$ DX4$;  *my set actually has DX1-DX60 with 3M observations;

      cards;

      959.01      800         959.09      959.09

      959.1       959.1       959.11      809

      959.19      959.14      959.2       959.9

      959.9       959.9       959.19      959.01

      ;

run;

After the conversion the dataset would look like the following:

      DX1         DX2         DX3         DX4

      959.01      800.00      959.09      959.09

      959.10      959.10      959.11      809.0

      959.19      959.14      959.20      959.90

      959.90      959.90      959.19      959.01

Thank you in advance for any help or suggestions. I have posted on

this site once before and received great and expedient feedback.

I greatly appreciate your skills and help.

H L;


Accepted Solutions
Solution
‎03-06-2014 08:00 PM
Super User
Posts: 11,343

Re: Replacing multiple variable values for multiple variables

If you aren't reading the data then using the first recode format with an array to reassign the values.

data want;

     set have;

     array FixDX DX: ; /* assumes you don't have any variables starting with DX that you aren't going to recode. and that all DX are same type or list like DX1 - DX60 assuming named that way.*/

     do _i_ = 1 to dim(FixDX);

          FixDX[_i_] = Put (FixDX[_i_],$recode.);

     end;

run;

You example with data test3 didn't do  anything to values because you only set the default informat for the variables, you didn't actually do anything to read them. The above code I just posted should also work by changing the PUT to Input(FixDX[_i_],$inrecode.);

Informats only actually do anything when reading or using an input statement.

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

Re: Replacing multiple variable values for multiple variables

What is the rule for only adding one decimal place to values like 809.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Replacing multiple variable values for multiple variables

Posted in reply to data_null__

@ Data_null_

I am not exactly sure what you are asking.

As for the values that I need to replace, the rule is not discernible by a pattern or algorithm. These are diagnostic codes, which some have subgroups to one or two decimal places while others do not. Some have no subgroups, but decimal padding. I am held victim to the existing codes.

Super User
Posts: 11,343

Re: Replacing multiple variable values for multiple variables

Probably the easiest might be to create a custom format (or possibly informat)

Do you have a table of the current and needed values? If so we could probably build a control data set instead of writing proc format code.

proc format;

value $recode

"959.1" ="959.10"

"959.2" ="959.20"

"959.9" ="959.90"

"800"   ="800.00"

"809"   ="809.0"

/* add other lines as needed*/

;

run;

Advantage of a format is that you could leave the data as is and use the format as needed;

Proc print data=test;

     var DX:;

     format DX: $recode.;

run;

Or an INFORMAT could be created to read

proc format;

invalue $inrecode

"959.1" ="959.10"

"959.2" ="959.20"

"959.9" ="959.90"

"800"   ="800.00"

"809"   ="809.0"

/* add other lines as needed*/

;

run;

data test;

   informat DX1$ DX2$ DX3$ DX4$ $inrecode.;

      input DX1$ DX2$ DX3$ DX4$;  *my set actually has DX1-DX60 with 3M observations;

      cards;

      959.01      800         959.09      959.09

      959.1       959.1       959.11      809

      959.19      959.14      959.2       959.9

      959.9       959.9       959.19      959.01

      ;

run;

PROC Star
Posts: 1,167

Re: Replacing multiple variable values for multiple variables

Oddly enough, I just finished working on an SGF paper that covers similar ground. First of all, I believe that either a format/informat or hash table approach is the best in terms of how to store your 300 source / target couplets. I just ran a test on my PC on a much larger case than that, and performance was excellent in both.

In terms of how to manage your DX1 - DX60 variables in each record, my thinking is to define them as an array and loop over it, but there might be better ideas out there.

Good news...this will be 1) easy and 2) fast.

Tom

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Replacing multiple variable values for multiple variables

@ ballardw, unfortunately I will not get a chance to explore your option until tomorrow. I will then update on progress or questions.

@ TomKari, is the SGF paper currently available?

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Replacing multiple variable values for multiple variables

@ ballardw,

Your approach and code worked perfectly once I removed the "$s" in the informat line. However, as I attempted to allude to in my original post, I won't be inputting these data (very large existing dataset). When I tried to apply it to an existing dataset, I was unsuccessful. Do you have any suggestions to help me out? See below for my own failed attempt.

proc format;

invalue $inrecode

"959.1" ="959.10"

"959.2" ="959.20"

"959.9" ="959.90"

"800"   ="800.00"

"809"   ="809.0"

/* add other lines as needed*/

;

run;

data test2; *this data step will not exist in my real-life situation;

      input DX1$ DX2$ DX3$ DX4$;  *my set actually has DX1-DX60 with 3M observations;

      cards;

      959.01      800         959.09      959.09

      959.1       959.1       959.11      809

      959.19      959.14      959.2       959.9

      959.9       959.9       959.19      959.01

      ;

run;

data test3;

      informat DX1 DX2 DX3 DX4 $inrecode.;

      set test2;

run;

proc print data=test3;

run;

Here is my log message:

WARNING: Multiple lengths were specified for the variable DX1 by input

         data set(s). This may cause truncation of data.

WARNING: Multiple lengths were specified for the variable DX2 by input

         data set(s). This may cause truncation of data.

WARNING: Multiple lengths were specified for the variable DX3 by input

         data set(s). This may cause truncation of data.

WARNING: Multiple lengths were specified for the variable DX4 by input

         data set(s). This may cause truncation of data.

NOTE: There were 4 observations read from the data set WORK.TEST2.

NOTE: The data set WORK.TEST3 has 4 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.01 seconds

Solution
‎03-06-2014 08:00 PM
Super User
Posts: 11,343

Re: Replacing multiple variable values for multiple variables

If you aren't reading the data then using the first recode format with an array to reassign the values.

data want;

     set have;

     array FixDX DX: ; /* assumes you don't have any variables starting with DX that you aren't going to recode. and that all DX are same type or list like DX1 - DX60 assuming named that way.*/

     do _i_ = 1 to dim(FixDX);

          FixDX[_i_] = Put (FixDX[_i_],$recode.);

     end;

run;

You example with data test3 didn't do  anything to values because you only set the default informat for the variables, you didn't actually do anything to read them. The above code I just posted should also work by changing the PUT to Input(FixDX[_i_],$inrecode.);

Informats only actually do anything when reading or using an input statement.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Replacing multiple variable values for multiple variables

Code seems to work perfectly using the input (on the practice set).

ballardw, yes I kind of knew I needed to input these data first, but my data management skills were lacking. I usually know there is a way to do most anything in SAS, but I am limited by how infrequently I need to performed these tasks.

Thank you both for your input and help! I will update you all later in the day on how the actual application of the code went.

PROC Star
Posts: 1,167

Re: Replacing multiple variable values for multiple variables

Let us know how long it takes, and on what kind of machine!

Tom

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2206 views
  • 1 like
  • 4 in conversation