Dear SAS community,
I am a new user of SAS and I guess this is a fairly easy question but I have been searching the web and I can't solve it.
I receive a file on a weekly basis containing trainings that employees have to take and I need to pivot some of its columns. The first 4 variables are always called the same (they are basic employees info), but the rest varies (the variables names change and I have a different number of columns each time, as each week different trainings have to be taken). Here is my code:
PROC IMPORT OUT= IMPORT.Mandatory_Trainings
DATAFILE= "Excel file path"
DBMS=EXCEL REPLACE;
Range="A1:AAA100000";
GETNAMES=Yes;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
proc transpose Data=IMPORT.Mandatory_Trainings OUT=Base.Mandatory_Trainings_Pivot;
BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;
VAR XXXXX;
run;
The line "VAR XXXXX;" is where I need to change my code.
I put the name of the first and last training and separated them with "--" ("First_Training_Name -- Last_Training_Name_") and it worked perfectly fine but I want a more automated way where SAS select and pivot all variables that are between column 5 and the last one. Any advice on how to do this?
Thanks in advance for the help.
You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".
data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;
The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.
Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:
proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;Well, your best options would be to drop Excel as a data transfer medium - this would fix all your problems, make a far more robust data transfer system, and make your life easier. However you will say that you can't and so your process will always be sub-par. You can get a list of varaible names via:
proc sql;
  select  NAME
  into     :VLIST separated by " "
  from    DICTIONARY.COLUMNS
  where  LIBNAME="<thelibname>"
     and  MEMNAME="<dataste name>";
quit;
...
  var &varlist.;
...
I can think of at least a few issues with the process - transposing, what happens when there are mixed number/character and if you let SAS auto convert it will it be right. How will you validate such a process? What happens when the spreadsheet changes - i.e. your four always there variables - you don't have a signed import agreement so they could send anything they like. Etc.
Select excel variables based on their column number to piviot
I assume variables to transform are after 'ID' variables.
HAVE (Two workbooks)
d:/xls/class.xlsx
 +-------------------------+------+-----------+------------
 |      |    A      |   B  |   B  |    C      |    D      |
 +------+-----------+------+------+-----------+-----------+
 |      |           |      |      |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |
 +------------------+------+------+-----------+-----------+
 [CLASS}
d:/xls/class.xlsx
 +-------------------------+------+-----------+-----------+-----------+
 |      |    A      |   B  |   B  |    C      |    D      |   E       |
 +------+-----------+------+------+-----------+-----------+-----------+
 |      |           |      |      |           |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |  PREDICT  |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |   91      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |   82      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |   83      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |   94      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |   85      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |   ...     |
 +------------------+------+------+-----------+-----------+-----------+
 [CLASSFIT}
WANT
Up to 40 obs from WORK.CLASSXPO total obs=95
Obs    NAME       _NAME_      COL1
  1    Alfred     NAME        Alfred
  2    Alfred     SEX         M
  3    Alfred     AGE         14
  4    Alfred     HEIGHT      69
  5    Alfred     WEIGHT      112.5
  6    Alice      NAME        Alice
  7    Alice      SEX         F
  8    Alice      AGE         13
  9    Alice      HEIGHT      56.5
 10    Alice      WEIGHT      84
...
Obs    NAME       _NAME_       COL1
  1    Alfred     NAME         Alfred
  2    Alfred     SEX          M
  3    Alfred     AGE          14
  4    Alfred     HEIGHT       69
  5    Alfred     WEIGHT       112.5
  6    Alfred     PREDICT      126
 11    Alice      NAME         Alice
 12    Alice      SEX          F
 13    Alice      AGE          13
 14    Alice      HEIGHT       56.5
 15    Alice      WEIGHT       84
 16    Alice      PREDICT      77
....
WORKING CODE
============
      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;
FULL SOLUTION
=============
* create some code;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      %*utlfkil(d:/xls/&bok..xlsx);
      libname xel "d:/xls/&bok..xlsx";
      data xel.&bok;
         set sashelp.&bok;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;
* transpose;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      libname xel "d:/xls/&bok..xlsx";
      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;
You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".
data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;
The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.
Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:
proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
