Hi. I have a dataset step1 shown as below. There are seven different time periods for unit1, unit2 and unit3, respectively.
I used Proc Transpose to show dates in columns and units in row. However, in the final table (step2), the dates were not arranged in order - 3/31/16 was showing in the last column. I personally suspect it might have something to do with the indicator.
Does anyone know why and how to fix it?
Thanks for any advices.
DATA STEP1;
INFILE CARDS MISSOVER;
INPUT DATE$ UNIT$ AMT IND$;
CARDS;
3/31/15 UNIT1 150 Y
3/31/16 UNIT1 200
4/30/16 UNIT1 300 Y
9/30/16 UNIT1 350 Y
12/31/16 UNIT1 400 Y
3/31/17 UNIT1 450 Y
4/30/17 UNIT1 500 Y
3/31/15 UNIT2 250
3/31/16 UNIT2 200 Y
4/30/16 UNIT2 400 Y
9/30/16 UNIT2 150 Y
12/31/16 UNIT2 200
3/31/17 UNIT2 450 Y
4/30/17 UNIT2 500 Y
3/31/15 UNIT3 550
3/31/16 UNIT3 200
4/30/16 UNIT3 300
9/30/16 UNIT3 350 Y
12/31/16 UNIT3 400 Y
3/31/17 UNIT3 450 Y
4/30/17 UNIT3 500 Y
;
RUN;
PROC SORT DATA=STEP1;
BY UNIT;
RUN;
PROC TRANSPOSE DATA=STEP1 OUT=STEP2;
WHERE IND = 'Y';
BY UNIT;
ID DATE;
VAR AMT;
RUN;
PROC PRINT DATA=STEP2;
RUN;
Obs | UNIT | _NAME_ | 3/31/2015 | 4/30/2016 | 9/30/2016 | 12/31/2016 | 3/31/2017 | 4/30/2017 | 3/31/2016 |
1 | UNIT1 | AMT | 150 | 300 | 350 | 400 | 450 | 500 | . |
2 | UNIT2 | AMT | . | 400 | 150 | . | 450 | 500 | 200 |
3 | UNIT3 | AMT | . | . | 350 | 400 | 450 | 500 | . |
Create an ID with observations with all possible dates, for example.
DATA STEP1;
INFILE CARDS MISSOVER;
INPUT DATE:mmddyy. UNIT$ AMT IND$;
format date mmddyy10.;
CARDS;
3/31/15 UNIT1 150 Y
3/31/16 UNIT1 200
4/30/16 UNIT1 300 Y
9/30/16 UNIT1 350 Y
12/31/16 UNIT1 400 Y
3/31/17 UNIT1 450 Y
4/30/17 UNIT1 500 Y
3/31/15 UNIT2 250
3/31/16 UNIT2 200 Y
4/30/16 UNIT2 400 Y
9/30/16 UNIT2 150 Y
12/31/16 UNIT2 200
3/31/17 UNIT2 450 Y
4/30/17 UNIT2 500 Y
3/31/15 UNIT3 550
3/31/16 UNIT3 200
4/30/16 UNIT3 300
9/30/16 UNIT3 350 Y
12/31/16 UNIT3 400 Y
3/31/17 UNIT3 450 Y
4/30/17 UNIT3 500 Y
;
RUN;
proc sort data=step1;
by unit;
run;
proc print;
run;
proc sort data=step1(keep=date) out=dates nodupkey;
by date;
run;
data step2;
set dates(in=in1) step1;
if in1 then ind = 'Y';
run;
proc print;
run;
proc transpose data=step2 out=step3(where=(not missing(unit))) prefix=D;
by unit;
where ind eq 'Y';
var amt;
id date;
run;
proc print;
run;
I would think that you're easiest solution, at this point, would be to add a data step like;
data step2 (drop=_name_); retain unit _3_31_15 _3_31_16 _4_30_16 _9_30_16 _12_31_16 _3_31_17 _4_30_17; set step2; run;
Art, CEO, AnalystFinder.com
DATES as CHARACTER values will only sort correctly when you use yyyymmdd values.
Consider:
DATA STEP1; INFILE CARDS MISSOVER; informat date mmddyy8.; format date mmddyy8.; INPUT DATE UNIT$ AMT IND$; CARDS; 3/31/15 UNIT1 150 Y 3/31/16 UNIT1 200 4/30/16 UNIT1 300 Y 9/30/16 UNIT1 350 Y 12/31/16 UNIT1 400 Y 3/31/17 UNIT1 450 Y 4/30/17 UNIT1 500 Y 3/31/15 UNIT2 250 3/31/16 UNIT2 200 Y 4/30/16 UNIT2 400 Y 9/30/16 UNIT2 150 Y 12/31/16 UNIT2 200 3/31/17 UNIT2 450 Y 4/30/17 UNIT2 500 Y 3/31/15 UNIT3 550 3/31/16 UNIT3 200 4/30/16 UNIT3 300 9/30/16 UNIT3 350 Y 12/31/16 UNIT3 400 Y 3/31/17 UNIT3 450 Y 4/30/17 UNIT3 500 Y ; RUN; proc tabulate data=step1; where ind='Y'; class unit date; format date mmddyy8.; var amt; tables unit='', date=''*amt=''*max=''*f=best6. /misstext= ' '; run;
Create an ID with observations with all possible dates, for example.
DATA STEP1;
INFILE CARDS MISSOVER;
INPUT DATE:mmddyy. UNIT$ AMT IND$;
format date mmddyy10.;
CARDS;
3/31/15 UNIT1 150 Y
3/31/16 UNIT1 200
4/30/16 UNIT1 300 Y
9/30/16 UNIT1 350 Y
12/31/16 UNIT1 400 Y
3/31/17 UNIT1 450 Y
4/30/17 UNIT1 500 Y
3/31/15 UNIT2 250
3/31/16 UNIT2 200 Y
4/30/16 UNIT2 400 Y
9/30/16 UNIT2 150 Y
12/31/16 UNIT2 200
3/31/17 UNIT2 450 Y
4/30/17 UNIT2 500 Y
3/31/15 UNIT3 550
3/31/16 UNIT3 200
4/30/16 UNIT3 300
9/30/16 UNIT3 350 Y
12/31/16 UNIT3 400 Y
3/31/17 UNIT3 450 Y
4/30/17 UNIT3 500 Y
;
RUN;
proc sort data=step1;
by unit;
run;
proc print;
run;
proc sort data=step1(keep=date) out=dates nodupkey;
by date;
run;
data step2;
set dates(in=in1) step1;
if in1 then ind = 'Y';
run;
proc print;
run;
proc transpose data=step2 out=step3(where=(not missing(unit))) prefix=D;
by unit;
where ind eq 'Y';
var amt;
id date;
run;
proc print;
run;
Yes! If you run it after ALL of your other steps, and they run without error, by definition it will re-order your variables. If it doesn't, you would have to post your log in order for any of us to identify why it didn't work.
Art, CEO, AnalystFinder.com
Hi Art297. Thanks for your method. I run the data step after all the procedures, but seems I got warning in the lob. see below - it says the date variables are uninitialized. Perhaps I should have added a prefix in the Proc Transpose?
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROJECTPATH='';
5 %LET _CLIENTPROJECTNAME='';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
13 STYLE=HtmlBlue
14 STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SAS94/x86/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
15 NOGTITLE
16 NOGFOOTNOTE
17 GPATH=&sasworklocation
18 ENCODING=UTF8
19 options(rolap="on")
20 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
21
22 GOPTIONS ACCESSIBLE;
23
24 data step2 (drop=_:);
25 retain unit _3_31_15 _3_31_16 _4_30_16 _9_30_16 _12_31_16
26 _3_31_17 _4_30_17;
27 set STEP2;
28 run;
NOTE: Variable _3_31_15 is uninitialized.
NOTE: Variable _3_31_16 is uninitialized.
NOTE: Variable _4_30_16 is uninitialized.
NOTE: Variable _9_30_16 is uninitialized.
NOTE: Variable _12_31_16 is uninitialized.
NOTE: Variable _3_31_17 is uninitialized.
NOTE: Variable _4_30_17 is uninitialized.
WARNING: The variable '_:'n in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: There were 3 observations read from the data set WORK.STEP2.
NOTE: The data set WORK.STEP2 has 3 observations and 8 variables.
NOTE: Compressing data set WORK.STEP2 increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 934.37k
OS Memory 20640.00k
Timestamp 06/02/2017 02:42:03 PM
Step Count 9 Switch Count 60
Page Faults 0
Page Reclaims 164
Page Swaps 0
Voluntary Context Switches 188
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 272
2 The SAS System 14:38 Friday, June 2, 2017
29
30
31 GOPTIONS NOACCESSIBLE;
32 %LET _CLIENTTASKLABEL=;
33 %LET _CLIENTPROJECTPATH=;
34 %LET _CLIENTPROJECTNAME=;
35 %LET _SASPROGRAMFILE=;
36
37 ;*';*";*/;quit;run;
38 ODS _ALL_ CLOSE;
39
40
41 QUIT; RUN;
42
There was an error in the code I posted that I've since corrected. Try the following:
DATA STEP1; INFILE CARDS MISSOVER; INPUT DATE$ UNIT$ AMT IND$; CARDS; 3/31/15 UNIT1 150 Y 3/31/16 UNIT1 200 4/30/16 UNIT1 300 Y 9/30/16 UNIT1 350 Y 12/31/16 UNIT1 400 Y 3/31/17 UNIT1 450 Y 4/30/17 UNIT1 500 Y 3/31/15 UNIT2 250 3/31/16 UNIT2 200 Y 4/30/16 UNIT2 400 Y 9/30/16 UNIT2 150 Y 12/31/16 UNIT2 200 3/31/17 UNIT2 450 Y 4/30/17 UNIT2 500 Y 3/31/15 UNIT3 550 3/31/16 UNIT3 200 4/30/16 UNIT3 300 9/30/16 UNIT3 350 Y 12/31/16 UNIT3 400 Y 3/31/17 UNIT3 450 Y 4/30/17 UNIT3 500 Y ; RUN; PROC SORT DATA=STEP1; BY UNIT; RUN; PROC TRANSPOSE DATA=STEP1 OUT=STEP2; WHERE IND = 'Y'; BY UNIT; ID DATE; VAR AMT; RUN; data step2 (drop=_name_); retain unit _3_31_15 _3_31_16 _4_30_16 _9_30_16 _12_31_16 _3_31_17 _4_30_17; set step2; run;
Art, CEO, AnalystFinder.com
Thanks!
@LL5 wrote:
Thanks data_null_ for this method. I'm trying to understand why Proc transpose has this weird result. Do you have any advices?
What weird result are you referring to? PROC TRANSPOSE creates new variables from ID values in the order they are encountered. What else did you expect? My advice is create a dummy ID to establish the order you seek. That is what I posted above.
When your variable is character then the character sort order is used. Which compares the first character in one value to the first in the second. So 12/16/2017 comes before 3/10/2015 because 1 comes before 3. If your data only represents one year then if you force character date-appearing values to have leading zeroes then 03/ compared to 12/ will sort correctly but with multiple years then you have all the january values together (which may be desireable).
Generally if a value is supposed to be a date, a time or datetime it is worth creating the correct SAS date/ time/ datetime value. Not only do the dates then sort correctly but then they are more useable in graphs (character values don't work for axis very well such as scatter or series) and you can do summaries by time intervals just by changing the format. Consider:
DATA STEP1; INFILE CARDS MISSOVER; informat date mmddyy8.; format date mmddyy8.; INPUT DATE UNIT$ AMT IND$; CARDS; 3/31/15 UNIT1 150 Y 3/31/16 UNIT1 200 4/30/16 UNIT1 300 Y 9/30/16 UNIT1 350 Y 12/31/16 UNIT1 400 Y 3/31/17 UNIT1 450 Y 4/30/17 UNIT1 500 Y 3/31/15 UNIT2 250 3/31/16 UNIT2 200 Y 4/30/16 UNIT2 400 Y 9/30/16 UNIT2 150 Y 12/31/16 UNIT2 200 3/31/17 UNIT2 450 Y 4/30/17 UNIT2 500 Y 3/31/15 UNIT3 550 3/31/16 UNIT3 200 4/30/16 UNIT3 300 9/30/16 UNIT3 350 Y 12/31/16 UNIT3 400 Y 3/31/17 UNIT3 450 Y 4/30/17 UNIT3 500 Y ; RUN; proc tabulate data=step1; title 'Annual Sum'; class date unit; format date year4.; var amt; table unit, date*amt*sum; run;title; proc tabulate data=step1; title 'Monthly Sum'; class date unit; format date yymon7.; var amt; table unit, date*amt*sum; run;title; proc tabulate data=step1; title 'Cross year Monthly Sum'; class date unit; format date monname10.; var amt; table unit, date*amt*sum; run;title;
With character values you would have to go through a lot of extra work to pull the character apart and create the correct period of interest and add variables to do it.
Coupled with the ability to create custom date formats with Proc Format and the functions to determine intervals between dates or to increment dates (or times or datetimes) the advantages really go to the proper values.
Ballardw,
Thanks a lot for your explanations and showing me the examples of using date format in Proc Tabulate to get different result! This apporach is eye opening and these techiques are extremely helpful to me!
And sorry for my stubbornness as I need Proc Transpose instead of Proc Tabulate to achieve my end result, so I applied the date format in my original data step (step 1) and run the Proc Sort and Proc Transpose again, I am still seeing the same result – 03/31/2016 came at last. I wonder why…
Anyway I could use the Proc Tabulate result or move around the columns from the Proc Transpose result in an excel spreadsheet, but If you can give me additional advices, that’d be great.
DATA STEP1;
INFILE CARDS MISSOVER;
INFORMAT DATE mmddyy10.;
FORMAT DATE mmddyy10.;
INPUT DATE UNIT$ AMT IND$;
CARDS;
3/31/15 UNIT1 150 Y
3/31/16 UNIT1 200
4/30/16 UNIT1 300 Y
9/30/16 UNIT1 350 Y
12/31/16 UNIT1 400 Y
3/31/17 UNIT1 450 Y
4/30/17 UNIT1 500 Y
3/31/15 UNIT2 250
3/31/16 UNIT2 200 Y
4/30/16 UNIT2 400 Y
9/30/16 UNIT2 150 Y
12/31/16 UNIT2 200
3/31/17 UNIT2 450 Y
4/30/17 UNIT2 500 Y
3/31/15 UNIT3 550
3/31/16 UNIT3 200
4/30/16 UNIT3 300
9/30/16 UNIT3 350 Y
12/31/16 UNIT3 400 Y
3/31/17 UNIT3 450 Y
4/30/17 UNIT3 500 Y
;
RUN;
PROC SORT DATA=STEP1;
BY UNIT DATE;
RUN;
PROC TRANSPOSE DATA=STEP1 OUT=STEP2;
WHERE IND = 'Y';
BY UNIT;
ID DATE;
VAR AMT;
RUN;
PROC PRINT DATA=STEP2;
RUN;
Obs | UNIT | _NAME_ | 03/31/2015 | 04/30/2016 | 09/30/2016 | 12/31/2016 | 03/31/2017 | 04/30/2017 | 03/31/2016 |
1 | UNIT1 | AMT | 150 | 300 | 350 | 400 | 450 | 500 | . |
2 | UNIT2 | AMT | . | 400 | 150 | . | 450 | 500 | 200 |
3 | UNIT3 | AMT | . | . | 350 | 400 | 450 | 500 | . |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.