note: Im using SAS EG version 7.15
I have a dataset which looks like below(please note that in reality its much bigger!).
(char) (num) (char)
type quarter phase
K-11 202001 1
K-11 202101 2
K-11 202003
K-12 202101 3
K-12 202102
K-12 202002 1
K-13 202002 2
K-13 202103 3
K-13 202104 2
I want to achieve the following:
1. I want to place the values of quarters as horizontal columns and fill their values with the values of phase. So my desired output would be this:
type 202001 202002 202003 202004 202101 202102 202103 202104 K-11 1 2 K-12 1 3 K-13 2 3 2
Sorry, but your dataset is simply dumb.
Storing a number which you need for calculation as character is stupidity #1. Numbers are numbers are numbers, and if you want to make a calculation from them, you STORE THEM THAT WAY, period.
Second, a date-related values HAS TO BE STORED AS SUCH in SAS, otherwise you deprive yourself completely of using all the nice tools SAS provides for dealing with such values. A SAS date value is numeric, contains a count of days starting at 1960-01-01, and has a date format attached.
For the intended result:
A wide layout is needed for two reasons: running a regression (or similar) analysis that needs lots of categorical values in one observation, or as a report for human consumption. For anything you want to use in further data processing, you use a long layout.
The phase differences are calculated like that:
data have;
infile datalines truncover;
input type $ quarter :yyq6. phase;
format quarter yyqn6.;
datalines;
K-11 2020Q01 1
K-11 2021Q01 2
K-11 2020Q03
K-12 2021Q01 3
K-12 2021Q02 1
K-12 2020Q02 1
K-13 2020Q02 2
K-13 2021Q03 3
K-13 2021Q04 2
;
proc sort data=have;
by type quarter;
run;
data want;
set have;
where phase ne .;
by type;
retain diff_phase;
if first.type then diff_phase = phase;
if last.type
then do;
diff_phase = phase - diff_phase;
output;
end;
diff_phase = phase;
keep type diff_phase;
run;
To create the wide dataset, you need this:
proc sql;
select min(quarter), max(quarter) into :minq, :maxq from have;
quit;
data template;
quarter = &minq.;
do until (quarter gt &maxq.);
output;
quarter = intnx('quarter',quarter,1);
end;
run;
data long;
set
template
have
;
run;
proc transpose
data=long
out=wide (
drop=_name_
where=(type ne "")
)
prefix=q_
;
by type;
var phase;
id quarter;
run;
data want_wide;
merge
wide
want
;
by type;
run;
The template is needed to create the order of columns for the tranpose.
All this is done easier in PROC REPORT:
data pre_report / view=pre_report;
merge
have
want
;
by type;
run;
proc report data=pre_report;
column type phase,quarter diff_phase;
define type / group;
define phase / "" analysis;
define quarter / "" across order=formatted;
define diff_phase / group;
run;
PROC REPORT does this:
data have;
input type $ quarter :yymmn6. phase;
format quarter yymmn6.;
datalines;
K-11 202001 1
K-11 202101 2
K-11 202003
K-12 202101 3
K-12 202102
K-12 202002 1
K-13 202002 2
K-13 202103 3
K-13 202104 2
;
proc report data=have;
column type phase,quarter;
define type / group;
define phase / "" analysis;
define quarter / "" across order=formatted;
run;
I tried your snippet on my data set and I got this:
ERROR: phase is an ANALYSIS variable but not numeric. NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE REPORT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Then we need to add a little twist to the REPORT procedure:
data have;
infile datalines truncover;
input type $ quarter :yymmn6. phase $;
format quarter yymmn6.;
datalines;
K-11 202001 1
K-11 202101 2
K-11 202003
K-12 202101 3
K-12 202102
K-12 202002 1
K-13 202002 2
K-13 202103 3
K-13 202104 2
;
proc report data=have;
column type phase,quarter n;
define type / group;
define phase / "" display;
define quarter / "" across order=formatted;
define n / noprint;
run;
By posting your data as a data step yourself, you prevent us from making mistakes with regards to the recreation of your data.
A structure like you showed is typical for reports, for datasets it is very unfavorable, as it makes further coding harder.
Maxim 19: Long Beats Wide.
So what do you want to do with your data further on?
Look at my two DATALINES examples, they contain character, numeric and date values. Most of what you need to know to prepare simple datasets yourself.
This the transpose code:
proc transpose
data=have
out=want (drop=_name_)
prefix=q_
;
by type;
var phase;
id quarter;
run;
@Kurt_Bremser I ran your transpose snippet and received:
ERROR: The ID value "K-11" occurs twice in the same BY group. ERROR: Too many bad BY groups.
What I want further with my data: I want to calculate the difference between the last quarter and the second last quarter and place that in a new column. After that I wan that dataset to be a LASR table to create some graphs with it.
With regards to your two DATALINES. I will use that as a reference for future questions. Thank you.
@Andalusia wrote:
@Kurt_Bremser I ran your transpose snippet and received:
ERROR: The ID value "K-11" occurs twice in the same BY group. ERROR: Too many bad BY groups.
What I want further with my data: I want to calculate the difference between the last quarter and the second last quarter and place that in a new column. After that I wan that dataset to be a LASR table to create some graphs with it.
With regards to your two DATALINES. I will use that as a reference for future questions. Thank you.
Cannot happen with the data you posted:
data have;
infile datalines truncover;
input type $ quarter :yymmn6. phase $;
format quarter yymmn6.;
datalines;
K-11 202001 1
K-11 202101 2
K-11 202003
K-12 202101 3
K-12 202102
K-12 202002 1
K-13 202002 2
K-13 202103 3
K-13 202104 2
;
proc transpose
data=have
out=want (drop=_name_)
prefix=q_
;
by type;
var phase;
id quarter;
run;
Log:
73 data have; 74 infile datalines truncover; 75 input type $ quarter :yymmn6. phase $; 76 format quarter yymmn6.; 77 datalines; NOTE: The data set WORK.HAVE has 9 observations and 3 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 87 ; 88 89 proc transpose 90 data=have 91 out=want (drop=_name_) 92 prefix=q_ 93 ; 94 by type; 95 var phase; 96 id quarter; 97 run; NOTE: There were 9 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 3 observations and 8 variables. NOTE: Verwendet wurde: PROZEDUR TRANSPOSE - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.02 seconds
And you POSITIVELY do NOT need to transpose. You can easily compare values across observations by using a RETAINed variable, or by joining the dataset with itself. This is all much easier done with a vertical dataset.
What "difference" do you want to calculate? phase is character in your dataset, so it's not suited for calculations.
Please post a comprehensive example of your dataset (containing "edge cases") in usable form, and the expected outcome.
@Kurt_Bremser Oke I tried to give you a comprehensive example of my dataset (containing "edge cases").
data have; infile datalines truncover; input type $ quarter phase $; datalines; K-11 202001 1 K-11 202101 2 K-11 202003 K-12 202101 3 K-12 202102 1 K-12 202002 1 K-13 202002 2 K-13 202103 3 K-13 202104 2 ;
My quarter column is just a numeric, that needs to stay like that. Oké below is what I want as a dataset (not a report!):
type 202001 202002 202003 202004 202101 202102 202103 202104 diff_phase K-11 1 2 1 K-12 1 3 1 -2 K-13 2 3 2 -1
@Kurt_Bremser I want all the columns above so I guess a transpose is necessary?. In diff_phase I calculate the difference between the last quarter and the second last quarter for every type.(So 2-1 for K-11, 1-3 for K-12 and 2-3 for K-13)
Really glad if you could help me solve this problem 🙂
Sorry, but your dataset is simply dumb.
Storing a number which you need for calculation as character is stupidity #1. Numbers are numbers are numbers, and if you want to make a calculation from them, you STORE THEM THAT WAY, period.
Second, a date-related values HAS TO BE STORED AS SUCH in SAS, otherwise you deprive yourself completely of using all the nice tools SAS provides for dealing with such values. A SAS date value is numeric, contains a count of days starting at 1960-01-01, and has a date format attached.
For the intended result:
A wide layout is needed for two reasons: running a regression (or similar) analysis that needs lots of categorical values in one observation, or as a report for human consumption. For anything you want to use in further data processing, you use a long layout.
The phase differences are calculated like that:
data have;
infile datalines truncover;
input type $ quarter :yyq6. phase;
format quarter yyqn6.;
datalines;
K-11 2020Q01 1
K-11 2021Q01 2
K-11 2020Q03
K-12 2021Q01 3
K-12 2021Q02 1
K-12 2020Q02 1
K-13 2020Q02 2
K-13 2021Q03 3
K-13 2021Q04 2
;
proc sort data=have;
by type quarter;
run;
data want;
set have;
where phase ne .;
by type;
retain diff_phase;
if first.type then diff_phase = phase;
if last.type
then do;
diff_phase = phase - diff_phase;
output;
end;
diff_phase = phase;
keep type diff_phase;
run;
To create the wide dataset, you need this:
proc sql;
select min(quarter), max(quarter) into :minq, :maxq from have;
quit;
data template;
quarter = &minq.;
do until (quarter gt &maxq.);
output;
quarter = intnx('quarter',quarter,1);
end;
run;
data long;
set
template
have
;
run;
proc transpose
data=long
out=wide (
drop=_name_
where=(type ne "")
)
prefix=q_
;
by type;
var phase;
id quarter;
run;
data want_wide;
merge
wide
want
;
by type;
run;
The template is needed to create the order of columns for the tranpose.
All this is done easier in PROC REPORT:
data pre_report / view=pre_report;
merge
have
want
;
by type;
run;
proc report data=pre_report;
column type phase,quarter diff_phase;
define type / group;
define phase / "" analysis;
define quarter / "" across order=formatted;
define diff_phase / group;
run;
@Kurt_Bremser This is a quiet complex situation, I dont know what is going wrong in my code. Can you see it? Also I can see that you changed the datelines and added the Q in it. I cannot do that and I also don't want to change my source data.
This is my code:
proc sort data=work.filtered_main_dataset;
by type quarter;
run;
data want;
set work.filtered_main_dataset;
where c_phase ne .;
by type;
retain diff_phase;
if first.type then diff_phase = c_phase;
if last.type
then do;
diff_phase = c_phase - diff_phase;
output;
end;
diff_phase = c_phase;
keep type diff_phase;
run;
data long;
set
work.filtered_main_dataset
;
run;
proc transpose
data=long
out=wide (
drop=_name_
where=(type ne "")
)
prefix=q_
;
by type;
var c_phase;
id quarter;
run;
data want_wide;
merge
wide
want
;
by type;
run;
This is my log:
2 The SAS System 23:15 Thursday, April 15, 2021 44 45 data long; 46 set 47 work.filtered_main_dataset 48 ; 49 run; NOTE: There were 7238 observations read from the data set WORK.FILTERED_MAIN_DATASET. NOTE: The data set WORK.LONG has 7238 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 50 51 proc transpose 52 data=long 53 out=wide ( 54 drop=_name_ 55 where=(type ne "") 56 ) 57 prefix=q_ 58 ; 59 by type; 60 var c_phase; 61 id quarter; 62 run; ERROR: The ID value "q_201904" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B013 ERROR: The ID value "q_201902" occurs twice in the same BY group. ERROR: The ID value "q_201904" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B014 ERROR: The ID value "q_201904" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B017 ERROR: The ID value "q_201904" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B020 ERROR: The ID value "q_201904" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B022 ERROR: The ID value "q_202004" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B025 ERROR: The ID value "q_202004" occurs twice in the same BY group. ERROR: The ID value "q_202004" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B028 ERROR: The ID value "q_201904" occurs twice in the same BY group. ERROR: The ID value "q_202004" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B030 ERROR: The ID value "q_201904" occurs twice in the same BY group. 3 The SAS System 23:15 Thursday, April 15, 2021 NOTE: The above message was for the following BY group: type=TWK-B035 ERROR: The ID value "q_202001" occurs twice in the same BY group. NOTE: The above message was for the following BY group: type=TWK-B037 ERROR: The ID value "q_201903" occurs twice in the same BY group. ERROR: Too many bad BY groups. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 269 observations read from the data set WORK.LONG. WARNING: The data set WORK.WIDE may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set WORK.WIDE was not replaced because this step was stopped. NOTE: PROCEDURE TRANSPOSE used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 63 64 data want_wide; 65 merge 66 wide 67 want 68 ; 69 by type; 70 run; WARNING: Multiple lengths were specified for the BY variable type by input data sets. This might cause unexpected results. NOTE: There were 3 observations read from the data set WORK.WIDE. NOTE: There were 988 observations read from the data set WORK.WANT. NOTE: The data set WORK.WANT_WIDE has 991 observations and 12 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 71 72 73 74 75 GOPTIONS NOACCESSIBLE; 76 %LET _CLIENTTASKLABEL=; 77 %LET _CLIENTPROCESSFLOWNAME=; 78 %LET _CLIENTPROJECTPATH=; 79 %LET _CLIENTPROJECTPATHHOST=; 80 %LET _CLIENTPROJECTNAME=; 81 %LET _SASPROGRAMFILE=; 82 %LET _SASPROGRAMFILEHOST=; 83 84 ;*';*";*/;quit;run; 85 ODS _ALL_ CLOSE; 86 87 88 QUIT; RUN;
This is how the first 28 rows of my filtered_main_dataset looks like:
Let's begin at the start: how do you receive your data? It may be that the unfavorable data types are the consequence of an ill-designed import process.
Next, there is no reason to continue working with dumb data if you receive it as such. While you process the data in SAS, you always strive for the most easy to handle form, which means treating dates/times as such and so on. If you need to re-export processed data to the source, you can always convert to the original data types then, but you do not make your own life harder just because someone else is a masochist. Or an idiot.
Next, implement Maxim 3 (Know Your Data). You have values for type that have more than one entry for a given quarter. Before being able to transpose/report on such data, you need to decide how to handle these multiple entries (select min, max, sum or average).
Im getting this data from an external data source. Oke to solve my problem I think I need to change my quarter data to your quarter data(with that Q). How can that be done easily then?
With regards to your Maxim 3. I want to take jus the last quarter if that quarter occurs more then once in a type. How can I achieve that?
In which firm does the data arrive? Direct unload from a DBMS, text file, Excel?
This will determine where you make the conversion to proper variable types and values.
Show the code you use to get the data into SAS.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.