BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rlolo
Calcite | Level 5

Dear SAS community,

 

I am a new user of SAS and I guess this is a fairly easy question but I have been searching the web and I can't solve it.

 

I receive a file on a weekly basis containing trainings that employees have to take and I need to pivot some of its columns. The first 4 variables are always called the same (they are basic employees info), but the rest varies (the variables names change and I have a different number of columns each time, as each week different trainings have to be taken). Here is my code:

 

PROC IMPORT OUT= IMPORT.Mandatory_Trainings

DATAFILE= "Excel file path"

DBMS=EXCEL REPLACE;

Range="A1:AAA100000";

GETNAMES=Yes;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

 

proc transpose Data=IMPORT.Mandatory_Trainings OUT=Base.Mandatory_Trainings_Pivot;

BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;

VAR XXXXX;

run;

 

The line "VAR XXXXX;" is where I need to change my code.

I put the name of the first and last training and separated them with "--" ("First_Training_Name -- Last_Training_Name_") and it worked perfectly fine but I want a more automated way where SAS select and pivot all variables that are between column 5 and the last one. Any advice on how to do this?

 

Thanks in advance for the help.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You  could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

 

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

 

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

  1. The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
  2. Then the retain _PRE statement appends it to the right, i.e. column 5.
  3. Then a second SET reads all the training vars.  Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
  4. Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required.   And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

 

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, your best options would be to drop Excel as a data transfer medium - this would fix all your problems, make a far more robust data transfer system, and make your life easier.  However you will say that you can't and so your process will always be sub-par.  You can get a list of varaible names via:

proc sql;
  select  NAME
  into     :VLIST separated by " "
  from    DICTIONARY.COLUMNS
  where  LIBNAME="<thelibname>"
     and  MEMNAME="<dataste name>";
quit;

...
  var &varlist.;
...

I can think of at least a few issues with the process - transposing, what happens when there are mixed number/character and if you let SAS auto convert it will it be right.  How will you validate such a process?  What happens when the spreadsheet changes - i.e. your four always there variables - you don't have a signed import agreement so they could send anything they like.  Etc.

rogerjdeangelis
Barite | Level 11
Select excel variables based on their column number to piviot

I assume variables to transform are after 'ID' variables.

HAVE (Two workbooks)

d:/xls/class.xlsx

 +-------------------------+------+-----------+------------
 |      |    A      |   B  |   B  |    C      |    D      |
 +------+-----------+------+------+-----------+-----------+
 |      |           |      |      |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |
 +------------------+------+------+-----------+-----------+

 [CLASS}


d:/xls/class.xlsx

 +-------------------------+------+-----------+-----------+-----------+
 |      |    A      |   B  |   B  |    C      |    D      |   E       |
 +------+-----------+------+------+-----------+-----------+-----------+
 |      |           |      |      |           |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |  PREDICT  |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |   91      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |   82      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |   83      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |   94      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |   85      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |   ...     |
 +------------------+------+------+-----------+-----------+-----------+

 [CLASSFIT}

WANT

Up to 40 obs from WORK.CLASSXPO total obs=95

Obs    NAME       _NAME_      COL1

  1    Alfred     NAME        Alfred
  2    Alfred     SEX         M
  3    Alfred     AGE         14
  4    Alfred     HEIGHT      69
  5    Alfred     WEIGHT      112.5
  6    Alice      NAME        Alice
  7    Alice      SEX         F
  8    Alice      AGE         13
  9    Alice      HEIGHT      56.5
 10    Alice      WEIGHT      84
...

Obs    NAME       _NAME_       COL1

  1    Alfred     NAME         Alfred
  2    Alfred     SEX          M
  3    Alfred     AGE          14
  4    Alfred     HEIGHT       69
  5    Alfred     WEIGHT       112.5
  6    Alfred     PREDICT      126
 11    Alice      NAME         Alice
 12    Alice      SEX          F
 13    Alice      AGE          13
 14    Alice      HEIGHT       56.5
 15    Alice      WEIGHT       84
 16    Alice      PREDICT      77
....

WORKING CODE
============

      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;

FULL SOLUTION
=============

* create some code;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      %*utlfkil(d:/xls/&bok..xlsx);
      libname xel "d:/xls/&bok..xlsx";
      data xel.&bok;
         set sashelp.&bok;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

* transpose;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      libname xel "d:/xls/&bok..xlsx";
      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

mkeintz
PROC Star

You  could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

 

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

 

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

  1. The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
  2. Then the retain _PRE statement appends it to the right, i.e. column 5.
  3. Then a second SET reads all the training vars.  Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
  4. Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required.   And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

 

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 10273 views
  • 2 likes
  • 4 in conversation