- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear SAS community,
I am a new user of SAS and I guess this is a fairly easy question but I have been searching the web and I can't solve it.
I receive a file on a weekly basis containing trainings that employees have to take and I need to pivot some of its columns. The first 4 variables are always called the same (they are basic employees info), but the rest varies (the variables names change and I have a different number of columns each time, as each week different trainings have to be taken). Here is my code:
PROC IMPORT OUT= IMPORT.Mandatory_Trainings
DATAFILE= "Excel file path"
DBMS=EXCEL REPLACE;
Range="A1:AAA100000";
GETNAMES=Yes;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
proc transpose Data=IMPORT.Mandatory_Trainings OUT=Base.Mandatory_Trainings_Pivot;
BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;
VAR XXXXX;
run;
The line "VAR XXXXX;" is where I need to change my code.
I put the name of the first and last training and separated them with "--" ("First_Training_Name -- Last_Training_Name_") and it worked perfectly fine but I want a more automated way where SAS select and pivot all variables that are between column 5 and the last one. Any advice on how to do this?
Thanks in advance for the help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".
data need;
set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
retain _PRE .;
set IMPORT.Mandatory_Trainings;
retain _POST .;
run;
The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.
- The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
- Then the retain _PRE statement appends it to the right, i.e. column 5.
- Then a second SET reads all the training vars. Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
- Then the retain _POST generates the right-most var.
Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:
proc transpose Data=need
OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;
var _pre -- _post;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, your best options would be to drop Excel as a data transfer medium - this would fix all your problems, make a far more robust data transfer system, and make your life easier. However you will say that you can't and so your process will always be sub-par. You can get a list of varaible names via:
proc sql; select NAME into :VLIST separated by " " from DICTIONARY.COLUMNS where LIBNAME="<thelibname>" and MEMNAME="<dataste name>"; quit; ... var &varlist.; ...
I can think of at least a few issues with the process - transposing, what happens when there are mixed number/character and if you let SAS auto convert it will it be right. How will you validate such a process? What happens when the spreadsheet changes - i.e. your four always there variables - you don't have a signed import agreement so they could send anything they like. Etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Select excel variables based on their column number to piviot
I assume variables to transform are after 'ID' variables.
HAVE (Two workbooks)
d:/xls/class.xlsx
+-------------------------+------+-----------+------------
| | A | B | B | C | D |
+------+-----------+------+------+-----------+-----------+
| | | | | | |
| 1 | NAME | AGE| SEX | HEIGHT | WEIGHT |
| 2 | Alfred | 14 | M | 55 | 96 |
| 3 | Alice | 13 | F | 44 | 87 |
| 4 | Barbara | 13 | F | 48 | 88 |
| 5 | Carol | 14 | F | 56 | 99 |
| 6 | Henry | 14 | M | 52 | 84 |
| ... | ... | ... | ... | .. | ... |
+------------------+------+------+-----------+-----------+
[CLASS}
d:/xls/class.xlsx
+-------------------------+------+-----------+-----------+-----------+
| | A | B | B | C | D | E |
+------+-----------+------+------+-----------+-----------+-----------+
| | | | | | | |
| 1 | NAME | AGE| SEX | HEIGHT | WEIGHT | PREDICT |
| 2 | Alfred | 14 | M | 55 | 96 | 91 |
| 3 | Alice | 13 | F | 44 | 87 | 82 |
| 4 | Barbara | 13 | F | 48 | 88 | 83 |
| 5 | Carol | 14 | F | 56 | 99 | 94 |
| 6 | Henry | 14 | M | 52 | 84 | 85 |
| ... | ... | ... | ... | .. | ... | ... |
+------------------+------+------+-----------+-----------+-----------+
[CLASSFIT}
WANT
Up to 40 obs from WORK.CLASSXPO total obs=95
Obs NAME _NAME_ COL1
1 Alfred NAME Alfred
2 Alfred SEX M
3 Alfred AGE 14
4 Alfred HEIGHT 69
5 Alfred WEIGHT 112.5
6 Alice NAME Alice
7 Alice SEX F
8 Alice AGE 13
9 Alice HEIGHT 56.5
10 Alice WEIGHT 84
...
Obs NAME _NAME_ COL1
1 Alfred NAME Alfred
2 Alfred SEX M
3 Alfred AGE 14
4 Alfred HEIGHT 69
5 Alfred WEIGHT 112.5
6 Alfred PREDICT 126
11 Alice NAME Alice
12 Alice SEX F
13 Alice AGE 13
14 Alice HEIGHT 56.5
15 Alice WEIGHT 84
16 Alice PREDICT 77
....
WORKING CODE
============
proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
by name;
var _all_;
FULL SOLUTION
=============
* create some code;
data _null_;
do wrkbok="classfit","class";
call symputx('bok',wrkbok);
rc=dosubl('
%*utlfkil(d:/xls/&bok..xlsx);
libname xel "d:/xls/&bok..xlsx";
data xel.&bok;
set sashelp.&bok;
run;quit;
libname xel clear;
');
end;
run;quit;
* transpose;
data _null_;
do wrkbok="classfit","class";
call symputx('bok',wrkbok);
rc=dosubl('
libname xel "d:/xls/&bok..xlsx";
proc transpose data=xel.&bok out=&bok.xpo(drop=_label_);
by name;
var _all_;
run;quit;
libname xel clear;
');
end;
run;quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could put this DATA step between the import and the transpose - in which you insert 2 sentinel variables (_PRE and _POST) in location 5 and location "last".
data need;
set IMPORT.Mandatory_Trainings (keep=ID Employee_Name Overall__Training_Status General_Exclusion__Status);
retain _PRE .;
set IMPORT.Mandatory_Trainings;
retain _POST .;
run;
The "trick" here is to understand how SAS compiler builds the program data vector (PDV) in a data step.
- The first SET keeps only a limited set of 4 variables, which become the 4 left-most in the PDV.
- Then the retain _PRE statement appends it to the right, i.e. column 5.
- Then a second SET reads all the training vars. Four of them are already on the left of _PRE, but the rest are appended to the right of _PRE.
- Then the retain _POST generates the right-most var.
Then you can modify the proc transpose to always have "var _PRE--_POST" - no changing varname awareness required. And just insert a "(where=(not(name in ("_PRE","_POST"))))" dataset-name-parameter in the "out=" clause:
proc transpose Data=need
OUT=Base.Mandatory_Trainings_Pivot (where=(not(_name_ in ('_PRE','_POST'))));
BY ID Employee_Name Overall__Training_Status General_Exclusion__Status;
var _pre -- _post;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------