SAS Programming

rlolo · Posted 02-14-2017 05:12 AM

Dear SAS community,

I am a new user of SAS and I guess this is a fairly easy question but I have been searching the web and I can't solve it.

I receive a file on a weekly basis containing trainings that employees have to take and I need to pivot some of its columns. The first 4 variables are always called the same (they are basic employees info), but the rest varies (the variables names change and I have a different number of columns each time, as each week different trainings have to be taken). Here is my code:

PROC IMPORT OUT= IMPORT.Mandatory_Trainings

DATAFILE= "Excel file path"

DBMS=EXCEL REPLACE;

Range="A1:AAA100000";

GETNAMES=Yes;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

proc transpose Data=IMPORT.Mandatory_Trainings OUT=Base.Mandatory_Trainings_Pivot;

BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;

VAR XXXXX;

run;

The line "VAR XXXXX;" is where I need to change my code.

I put the name of the first and last training and separated them with "--" ("First_Training_Name -- Last_Training_Name_") and it worked perfectly fine but I want a more automated way where SAS select and pivot all variables that are between column 5 and the last one. Any advice on how to do this?

Thanks in advance for the help.

mkeintz · Posted 02-14-2017 08:35 AM

You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
Then the retain _PRE statement appends it to the right, i.e. column 5.
Then a second SET reads all the training vars. Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

RW9 · Posted 02-14-2017 06:01 AM

Well, your best options would be to drop Excel as a data transfer medium - this would fix all your problems, make a far more robust data transfer system, and make your life easier. However you will say that you can't and so your process will always be sub-par. You can get a list of varaible names via:

proc sql;
  select  NAME
  into     :VLIST separated by " "
  from    DICTIONARY.COLUMNS
  where  LIBNAME="<thelibname>"
     and  MEMNAME="<dataste name>";
quit;

...
  var &varlist.;
...

I can think of at least a few issues with the process - transposing, what happens when there are mixed number/character and if you let SAS auto convert it will it be right. How will you validate such a process? What happens when the spreadsheet changes - i.e. your four always there variables - you don't have a signed import agreement so they could send anything they like. Etc.

rogerjdeangelis · Posted 02-14-2017 08:34 AM

Select excel variables based on their column number to piviot

I assume variables to transform are after 'ID' variables.

HAVE (Two workbooks)

d:/xls/class.xlsx

 +-------------------------+------+-----------+------------
 |      |    A      |   B  |   B  |    C      |    D      |
 +------+-----------+------+------+-----------+-----------+
 |      |           |      |      |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |
 +------------------+------+------+-----------+-----------+

 [CLASS}


d:/xls/class.xlsx

 +-------------------------+------+-----------+-----------+-----------+
 |      |    A      |   B  |   B  |    C      |    D      |   E       |
 +------+-----------+------+------+-----------+-----------+-----------+
 |      |           |      |      |           |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |  PREDICT  |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |   91      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |   82      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |   83      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |   94      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |   85      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |   ...     |
 +------------------+------+------+-----------+-----------+-----------+

 [CLASSFIT}

WANT

Up to 40 obs from WORK.CLASSXPO total obs=95

Obs    NAME       _NAME_      COL1

  1    Alfred     NAME        Alfred
  2    Alfred     SEX         M
  3    Alfred     AGE         14
  4    Alfred     HEIGHT      69
  5    Alfred     WEIGHT      112.5
  6    Alice      NAME        Alice
  7    Alice      SEX         F
  8    Alice      AGE         13
  9    Alice      HEIGHT      56.5
 10    Alice      WEIGHT      84
...

Obs    NAME       _NAME_       COL1

  1    Alfred     NAME         Alfred
  2    Alfred     SEX          M
  3    Alfred     AGE          14
  4    Alfred     HEIGHT       69
  5    Alfred     WEIGHT       112.5
  6    Alfred     PREDICT      126
 11    Alice      NAME         Alice
 12    Alice      SEX          F
 13    Alice      AGE          13
 14    Alice      HEIGHT       56.5
 15    Alice      WEIGHT       84
 16    Alice      PREDICT      77
....

WORKING CODE
============

      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;

FULL SOLUTION
=============

* create some code;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      %*utlfkil(d:/xls/&bok..xlsx);
      libname xel "d:/xls/&bok..xlsx";
      data xel.&bok;
         set sashelp.&bok;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

* transpose;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      libname xel "d:/xls/&bok..xlsx";
      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

mkeintz · Posted 02-14-2017 08:35 AM

You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
Then the retain _PRE statement appends it to the right, i.e. column 5.
Then a second SET reads all the training vars. Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Select variables based on their column number

Re: Select variables based on their column number

Re: Select variables based on their column number

Re: Select variables based on their column number

Re: Select variables based on their column number

Numbers as variable/column names in SAS Studio

Forward Selection Process with Multinominal Response Variable

How to rename all variables base on column position?

column wise sum positive numbers

SELECT numbers dividable by 1000

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...