DATA Step, Macro, Functions and more

Select variables based on their column number

Accepted Solution Solved
Reply
Senior User
Posts: 1
Accepted Solution

Select variables based on their column number

Dear SAS community,

 

I am a new user of SAS and I guess this is a fairly easy question but I have been searching the web and I can't solve it.

 

I receive a file on a weekly basis containing trainings that employees have to take and I need to pivot some of its columns. The first 4 variables are always called the same (they are basic employees info), but the rest varies (the variables names change and I have a different number of columns each time, as each week different trainings have to be taken). Here is my code:

 

PROC IMPORT OUT= IMPORT.Mandatory_Trainings

DATAFILE= "Excel file path"

DBMS=EXCEL REPLACE;

Range="A1:AAA100000";

GETNAMES=Yes;

MIXED=NO;

SCANTEXT=YES;

USEDATE=YES;

SCANTIME=YES;

RUN;

 

proc transpose Data=IMPORT.Mandatory_Trainings OUT=Base.Mandatory_Trainings_Pivot;

BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;

VAR XXXXX;

run;

 

The line "VAR XXXXX;" is where I need to change my code.

I put the name of the first and last training and separated them with "--" ("First_Training_Name -- Last_Training_Name_") and it worked perfectly fine but I want a more automated way where SAS select and pivot all variables that are between column 5 and the last one. Any advice on how to do this?

 

Thanks in advance for the help.


Accepted Solutions
Solution
‎02-20-2017 04:42 AM
Valued Guide
Posts: 797

Re: Select variables based on their column number

You  could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

 

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

 

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

  1. The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
  2. Then the retain _PRE statement appends it to the right, i.e. column 5.
  3. Then a second SET reads all the training vars.  Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
  4. Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required.   And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

 

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,392

Re: Select variables based on their column number

Well, your best options would be to drop Excel as a data transfer medium - this would fix all your problems, make a far more robust data transfer system, and make your life easier.  However you will say that you can't and so your process will always be sub-par.  You can get a list of varaible names via:

proc sql;
  select  NAME
  into     :VLIST separated by " "
  from    DICTIONARY.COLUMNS
  where  LIBNAME="<thelibname>"
     and  MEMNAME="<dataste name>";
quit;

...
  var &varlist.;
...

I can think of at least a few issues with the process - transposing, what happens when there are mixed number/character and if you let SAS auto convert it will it be right.  How will you validate such a process?  What happens when the spreadsheet changes - i.e. your four always there variables - you don't have a signed import agreement so they could send anything they like.  Etc.

Valued Guide
Posts: 505

Re: Select variables based on their column number

Select excel variables based on their column number to piviot

I assume variables to transform are after 'ID' variables.

HAVE (Two workbooks)

d:/xls/class.xlsx

 +-------------------------+------+-----------+------------
 |      |    A      |   B  |   B  |    C      |    D      |
 +------+-----------+------+------+-----------+-----------+
 |      |           |      |      |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |
 +------------------+------+------+-----------+-----------+

 [CLASS}


d:/xls/class.xlsx

 +-------------------------+------+-----------+-----------+-----------+
 |      |    A      |   B  |   B  |    C      |    D      |   E       |
 +------+-----------+------+------+-----------+-----------+-----------+
 |      |           |      |      |           |           |           |
 |    1 |   NAME    |   AGE|  SEX |  HEIGHT   |  WEIGHT   |  PREDICT  |
 |    2 |   Alfred  |   14 |   M  |   55      |   96      |   91      |
 |    3 |   Alice   |   13 |   F  |   44      |   87      |   82      |
 |    4 |   Barbara |   13 |   F  |   48      |   88      |   83      |
 |    5 |   Carol   |   14 |   F  |   56      |   99      |   94      |
 |    6 |   Henry   |   14 |   M  |   52      |   84      |   85      |
 | ...  |   ...     |  ... | ...  |   ..      |   ...     |   ...     |
 +------------------+------+------+-----------+-----------+-----------+

 [CLASSFIT}

WANT

Up to 40 obs from WORK.CLASSXPO total obs=95

Obs    NAME       _NAME_      COL1

  1    Alfred     NAME        Alfred
  2    Alfred     SEX         M
  3    Alfred     AGE         14
  4    Alfred     HEIGHT      69
  5    Alfred     WEIGHT      112.5
  6    Alice      NAME        Alice
  7    Alice      SEX         F
  8    Alice      AGE         13
  9    Alice      HEIGHT      56.5
 10    Alice      WEIGHT      84
...

Obs    NAME       _NAME_       COL1

  1    Alfred     NAME         Alfred
  2    Alfred     SEX          M
  3    Alfred     AGE          14
  4    Alfred     HEIGHT       69
  5    Alfred     WEIGHT       112.5
  6    Alfred     PREDICT      126
 11    Alice      NAME         Alice
 12    Alice      SEX          F
 13    Alice      AGE          13
 14    Alice      HEIGHT       56.5
 15    Alice      WEIGHT       84
 16    Alice      PREDICT      77
....

WORKING CODE
============

      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;

FULL SOLUTION
=============

* create some code;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      %*utlfkil(d:/xls/&bok..xlsx);
      libname xel "d:/xls/&bok..xlsx";
      data xel.&bok;
         set sashelp.&bok;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

* transpose;
data _null_;
  do wrkbok="classfit","class";
    call symputx('bok',wrkbok);
    rc=dosubl('
      libname xel "d:/xls/&bok..xlsx";
      proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
         by name;
         var _all_;
      run;quit;
      libname xel clear;
    ');
  end;
run;quit;

Solution
‎02-20-2017 04:42 AM
Valued Guide
Posts: 797

Re: Select variables based on their column number

You  could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".

 

data need;
  set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
  retain _PRE .;
  set IMPORT.Mandatory_Trainings;
  retain _POST .;
run;

 

The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.

  1. The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
  2. Then the retain _PRE statement appends it to the right, i.e. column 5.
  3. Then a second SET reads all the training vars.  Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
  4. Then the retain _POST generates the right-most var.

Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required.   And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:

 

proc transpose Data=need
  OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
  BY ID Employee_Name Overall__Training_Status General_Exclusion__Status; 
  var _pre -- _post;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 177 views
  • 2 likes
  • 4 in conversation