Re: Recode variable keeping the same name and position

sasecn · Posted 03-05-2022 08:35 AM

Wondered if there is an easy way to recode a variable, and keep the same name and position. For example, using the attached code, I can recode variable Type in sashelp.cars, after that I want to call it Type and keep the same position. Imagine there are hundreds of variables. I don't want to list all of them to keep the order. Is there any easy way to do it with too many coding?

PROC SQL;
   CREATE TABLE want AS 
   SELECT Make, 
          Model, 
          case when Type in ("Hybrid", "Wagon")
               then "Wagon"
			   else type
          end as type, 
          Origin, 
          DriveTrain, 
          MSRP, 
          Invoice, 
          EngineSize, 
          Cylinders, 
          Horsepower, 
          MPG_City, 
          MPG_Highway, 
          Weight, 
          Wheelbase, 
          Length
      FROM SASHELP.CARS;
QUIT;

PaigeMiller · Posted 03-05-2022 08:42 AM

Why does it have to have the same position? Please explain.

SAS doesn't care. All of your work will be done properly regardless of position.

--
Paige Miller

sasecn · Posted 03-05-2022 08:46 AM

No specific reason to keep the position. Our dataset is always arranged by different group of variables. Just wanted to keep the recoded var in the same group. like said wondered if there is an easy wat to do it without many additional codes.

PaigeMiller · Posted 03-05-2022 11:22 AM

@sasecn wrote:

No specific reason to keep the position. Our dataset is always arranged by different group of variables. Just wanted to keep the recoded var in the same group. like said wondered if there is an easy wat to do it without many additional codes.

Then why spend time on it at all?

--
Paige Miller

Tom · Posted 03-05-2022 12:20 PM

@PaigeMiller wrote:

@sasecn wrote:

No specific reason to keep the position. Our dataset is always arranged by different group of variables. Just wanted to keep the recoded var in the same group. like said wondered if there is an easy wat to do it without many additional codes.

Then why spend time on it at all?

I don't know about you but I cannot keep the details of the contents of the 100's of thousands of individual datasets I might have to examine. If they are organized rationally then they are a h*** of lot easier to work with. You can open them and look at the first few records. Poke around. See what is going on.

ballardw · Posted 03-05-2022 10:06 AM

For a vast majority of cases there is no reason to create a new data set for such a simple activity.

When displaying the values of one or more variables with the same value and desired display then often creating a custom format and applying that will do what you want. No new data set. Flexible as a change in the definition of that format is automatically applied at us.

Example:

proc format;
value $alttype
"Hybrid", "Wagon" = "Wagon"
;
run;

Proc freq data=sashelp.cars;
   table type;
   format type $alttype.;
run;

The above format only affects the given values, none of the the other values of a variable would be affected.

Note that the group counts with that format applied combine the two types as one. Most of the graphing, reporting and analysis procedures will honor this behavior.

No need to create a new data set. See the example.

You could use Proc Datasets to change the default format of variables which avoids the time of rebuilding a data set, which can be significant with large sets.

Tom · Posted 03-05-2022 10:52 AM

Doing that is trivial if you use normal SAS code.

data want;
  set sashelp.cars;
  if Type in ("Hybrid", "Wagon") then type="Wagon";
run;

And if you actually had to change the type (or the length for character variables) it is not that much harder. You will need to know the name of the FIRST variable in the existing dataset.

data want;
  set sahelp.cars(keep=make--type rename=(type=oldtype));
  length type $100;
  set sashelp.cars(drop=type);
  if oldtype in ("Hybrid", "Wagon") then type="Wagon";
  else type=oldtype;
  drop oldtype;
run;

sasecn · Posted 03-06-2022 11:04 AM

Thank you!

Patrick · Posted 03-05-2022 07:07 PM

The order of variables in a tables shouldn't matter. This is more of an "Excel" thinking. But understand the convenience a defined order can provide.

If you want to create a totally new variable and have it added at a defined location in the target table then below some options.

For a data step (the two dashes in the variable list are important!):

data test_ds;
  /* column mapping */
  if 0 then 
    do;
      set sashelp.class(keep=name--age);
      length new_col $3;
      set sashelp.class(keep=height--weight);
    end;
  call missing(of _all_);

  /* read the source data */
  set sashelp.class;
  new_col='XXX';
run;

It's a bit more involved for a SQL as there the order of variables in the SELECT clause defines the order in the created table so you need to list all the variables.

To reduce typing if there are too many variables in the source table below some "lazy" way to get there.

1. Create list of all columns in source table

options ps=max ls=max;
proc sql feedback noexec;
  select *
  from sashelp.class as t
  ;
quit;

2. Copy/paste expanded SQL found in SAS Log into Notepad++ and run plugin "Poor Man's SQL Formatter"

In SAS log:

Result in Notepad++ after running the plugin

SELECT T.Name
	,T.Sex
	,T.Age
	,T.Height
	,T.Weight
FROM SASHELP.CLASS T;

3. Copy/past the formatted code from Notepad++ back into your SAS Editor and add the new column at the desired location

proc sql;
  create table test_sql as
    SELECT T.Name
      ,T.Sex
      ,T.Age
      ,'XXX' as new_col length=3
      ,T.Height
      ,T.Weight
    FROM SASHELP.CLASS T;
quit;

mkeintz · Posted 03-05-2022 08:44 PM

First, if you are using DATA step to recode variables, then variable order will be preserved.

data wnt;
  set sashelp.cars;
  if type='Hybrid' then type='Hyb'; else
  if type='Sedan'  then type='Sed'; else
  if type='Sports' then type='Spo'; else
  if type='Truck'  then type='Trk';
run;

But if you MUST use proc sql, then

Create a VIEW instead of a table, with all your records, ignoring variable order. Just issue the recodes to your convenience.
In a DATA step concatenate an empty original dataset with the above view, which will honor the variable order in the original:

proc sql  noprint;
  create view vcars as select 
    case when type='Hybrid' then 'Hyb' 
         when type='Sedan'  then 'Sed' 
         when type='Sports' then 'Spo' 
         when type='Truck'  then 'Trk' 
         else type
    end as type
   ,*
  from sashelp.cars;
quit;

data want;
  set sashelp.cars (obs=0)
  vcars;
run;

The first idea here is that creating a view is just defining a set of rules. The data is not processed until the view is used. So you are not wasting resources writing an intermediate data set file to disk. The recoding doesn't take place until the subsequent DATA step.

The second idea is using the "obs=0" option for the first dataset named in the SET statement. This will read no observations directly from sashelp.cars but the second argument instructs SAS to read all the data from VCARS. Because the SAS compiler will construct the program data vector ordering variables as they are encountered, they will be ordered as in sashelp.cars, since it is the first argument of the SET statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sasecn · Posted 03-06-2022 11:03 AM

Thanks for the details!

Rabelais · Posted 08-19-2024 09:49 AM

I tried your code (second code block) in sas enterprise guide 8.3 Update 3 (8.3.3.181) (64-bit) but it doesn't work, in the sense that the variable "type" of the table "want" is equal to the variable "type" of the table "sashelp.cars".

In fact if I run

proc compare
base=sashelp.cars
comp=want;
run;

the output is "No unequal values were found. All values compared are exactly equal."

The are no warnings in the log, but I noticed a strange thing which is depicted in the screenshot below: the output of the sql code creating the view contains both the "type" variables (the new one is at the beginning). Is this supposed to happen?

Tom · Posted 08-19-2024 11:23 AM

That really should not happen with a SAS dataset since there can only be one variable named TYPE (variable names are NOT case sensitive).

But apparently PROC SQL is allowing you to create a view with both a variable named type and one named Type. Kind of like how the running a SELECT statement (without the CREATE prefix) allows you to output values from multiple variables that have the same name.

And worse if you then try to use that view in a data step SAS will use the first one to set the TYPE and LENGTH but then retrieve data from both of them so that it tries to fit the values read from the second into the variable defined by the first one.

Try this example:

proc sql;
create table ds1 as
  select 'X' as type,*
  from sashelp.cars(keep=make type obs=5)
;
create view v1 as
  select 'X' as type,*
  from sashelp.cars(keep=make type obs=5)
;
quit;
data ds2; set v1; run;
proc print data=ds1; run;
proc print data=v1; run;
proc print data=ds2; run;

You can see warnings in the log:

1    proc sql;
2    create table ds1 as
3      select 'X' as type,*
4      from sashelp.cars(keep=make type obs=5)
5    ;
WARNING: Variable Type already exists on file WORK.DS1.
NOTE: Table WORK.DS1 created, with 5 rows and 2 columns.

6    create view v1 as
7      select 'X' as type,*
8      from sashelp.cars(keep=make type obs=5)
9    ;
NOTE: SQL view WORK.V1 has been defined.
10   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds


11   data ds2; set v1; run;

WARNING: Multiple lengths were specified for the variable type by input data set(s). This can cause truncation of data.
NOTE: There were 5 observations read from the data set SASHELP.CARS.
NOTE: There were 5 observations read from the data set WORK.V1.
NOTE: The data set WORK.DS2 has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Quentin · Posted 08-19-2024 12:04 PM

That's interesting. I didn't realize a SQL view could have multiple columns with the same name.

proc sql;
create view v1 as
  select 1 as Foo 
        ,2 as Foo
        ,3 as FoO
        ,'4' as Foo 
  from sashelp.cars
;
quit;
proc contents data=v1 varnum;run ;

Returns:

Data Set Name        WORK.V1                Observations          .
Member Type          VIEW                   Variables             4
Engine               SQLVIEW                Indexes               0
Created              08/19/2024 12:01:53    Observation Length    25
Last Modified        08/19/2024 12:01:53    Deleted Observations  0
Protection                                  Compressed            NO
Data Set Type                               Sorted                NO
Label
Data Representation  Default
Encoding             Default


                    Variables in Creation Order

               #    Variable    Type    Len    Flags

               1    Foo         Num       8    P--
               2    Foo         Num       8    P--
               3    FoO         Num       8    P--
               4    Foo         Char      1    P--

Tom · Posted 08-19-2024 12:13 PM

You can also use something like that to see how PROC SQL and DATA step handle variable attributes like LENGTH/FORMAT/INFORMAT and LABEL differently when confronted with multiple versions of the same variable. (Which can happen with SET or MERGE in SAS and UNION in SQL.)

proc sql;
create view v1 as
  select '111' as Foo 
        ,'2' as Foo label='Two'
        ,'3' as Foo format=$1.
        ,'4' as Foo informat=$1.
  from sashelp.cars
;
create table ds1 as select * from v1;
quit;
data ds2; set v1; run;
proc contents data=ds1; run;
proc contents data=ds2; run;

SQL uses the first variables attributes.

Data step uses the first non-empty attribute.

Registration is open

SAS Training: Just a Click Away