data work.r_six_month (keep=ID Stage r_date); set r_prod_t; where r_date between "&Month_back6."d and "&last_month"d and latest_run_in_mth = 'Y'; run; PROC SQL; CREATE TABLE WORK.r_month_fmt AS SELECT ID, Stage, r_date, r_date FORMAT=MONNAME9. AS Month FROM WORK.R_SIX_MONTH; QUIT; proc sort data=r_month_fmt; by ID; run; proc transpose data=r_month_fmt out=r_trans_month(drop= _name_); by ID; var Stage; id Month; run; PROC SQL; CREATE TABLE WORK.r_t_format_month AS SELECT t1.ID, /* May_ */ (put(t1.' May'n,z1.)) AS May, /* June_ */ (put(t1.' June'n,z1.)) AS June, /* July_ */ (put(t1.' July'n,z1.)) AS July, /* August_ */ (put(t1.' August'n,z1.)) AS August, /* September_ */ (put(t1.September,z1.)) AS September, /* October_ */ (put(t1.' October'n,z1.)) AS October FROM WORK.r_trans_month t1; QUIT;
Hi,
below piece of code i want to make dynamic. After transpose from above step Month is converted to numeric
i want to make a dynamic format conversion ... because every month i pull the data for last six month. I don't want
to change the program and formats manually every month.
PROC SQL;
CREATE TABLE WORK.r_t_format_month AS
SELECT t1.ID,
/* May_ */
(put(t1.' May'n,z1.)) AS May,
/* June_ */
(put(t1.' June'n,z1.)) AS June,
/* July_ */
(put(t1.' July'n,z1.)) AS July,
/* August_ */
(put(t1.' August'n,z1.)) AS August,
/* September_ */
(put(t1.September,z1.)) AS September,
/* October_ */
(put(t1.' October'n,z1.)) AS October
FROM WORK.r_trans_month t1;
QUIT;
I'm not 100% sure what you are doing, but there are plenty of built-in formats in SAS that don't require you to re-code the list of months.
For example, the MONNAME. format may be what you want (although as I said I'm not 100% sure I know what you want).
Is your sorting/grouping variable named ID or PRIMARY_ACCOUNT_NO?
Do you need all of those intermediate datasets?
Do you even need the final output as a dataset? Having a dataset with data (month name) in metadata (variable name) is normally a bad idea. You might be able to produce the report you need directly from the data without transposing.
Can you show some example values of the input data for one or two key values and two or three months? Preferable in the form of a data step that reads from in-line data.
Hi,
Sorry my mistake it's ID
in my initial data as
ID Stage r_date
123456 1 31May2019
123456 1 30Jun2019
123456 2 31Jul2019
123456 2 31Aug2019
123456 1 30Sep2019
123456 1 31Oct2019
Post transpose MONTH column names has been converted to numeric format
ID May June July August September October
123456 1 1 2 2 1 1
I did this
PROC SQL;
CREATE TABLE WORK._INPUT1 AS
SELECT t1.ID,
/* May_ */
(put(t1.' May'n,z1.)) AS May,
/* June_ */
(put(t1.' June'n,z1.)) AS June,
/* July_ */
(put(t1.' July'n,z1.)) AS July,
/* August_ */
(put(t1.' August'n,z1.)) AS August,
/* September_ */
(put(t1.September,z1.)) AS September,
/* October_ */
(put(t1.' October'n,z1.)) AS October
FROM WORK.INPUT1 t1;
QUIT;
to execute below step:
data R_MONTH;
SET input1;
array mth(6) May June July August September October; /*how can i eliminate mentioning Month names going forward - any way automatically to pick last six month */
if mth(6) = 3 then
Month = " ";
else if mth(6) = 2 then
do;
do i = 1 to 5;
if mth(i) = 3 then
do; Month = i; leave; end;
end;
end;
else;
if mth(6) = 1 then
do;
do i = 1 to 5;
if mth(i) = 3 then
do;
Month = i; leave; end;
end;
if Month = " " then
do;
do i = 1 to 5;
if mth(i) = 2 then
do;
Month = i; leave; end;
end;
end;
end;
if Month NE " " then
NEW_MONTH = vname(mth(Month));
run;
As pointed out by @Tom, your logic and workflow seems to be in need of improvement. Creating columns with month names will cause all sorts of programming problems. Better to design a workflow and logic where you are not creating columns with month names, and then these programming issues go away. But it's not clear what you are doing and where you want to go. Can you describe in words, without code, and also by showing us examples, the original data and then the desired final output?
Hi @Tom
I am trying to achieve something like this: I didn't included ID in below image
May | June | July | August | September | October | Month | May | June | July | August | September | October | Revised Provision | Overlay |
1 | 1 | 1 | 1 | 1 | 1 | 0.35 | 0.35 | 0.34 | 0.34 | 0.33 | 0.24 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.06 | 0.06 | 0.06 | 0.06 | 0.05 | 0.05 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.04 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.04 | 0.04 | 0.07 | 0.07 | 0.07 | 0.07 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.26 | 0.26 | 0.25 | 0.25 | 0.24 | 0.24 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.11 | 0.11 | 0.10 | 0.10 | 0.10 | 0.14 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.33 | 0.33 | 0.32 | 0.43 | 0.43 | 0.42 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.24 | 0.24 | 0.35 | 0.47 | 0.46 | 0.46 | - | ||
1 | 1 | 1 | 1 | 1 | 1 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | - | ||
1 | 2 | 1 | 1 | 1 | 1 | June | 0.76 | 5.11 | 0.75 | 0.74 | 0.74 | 0.59 | 5.11 | 4.53 |
This appears to be a continuation of another request to tag when state changes where the rules were not clearly explained. Was that your question also? From this example it looks like you want to set MONTH to the name of the month where the stage variable fist become 2.
So something like this:
data have ;
input ID $ Stage r_date :date9.;
format r_date date9.;
cards;
123456 1 31May2019
123456 1 30Jun2019
123456 2 31Jul2019
123456 2 31Aug2019
123456 1 30Sep2019
123456 1 31Oct2019
;
data step1;
set have;
by id;
length month $10 ;
retain found;
if first.id then found=0;
if stage=2 and not found then do;
month=put(intnx('month',r_date,-1),monname.-l);
found=1;
end;
run;
Obs ID Stage r_date month found 1 123456 1 31MAY2019 0 2 123456 1 30JUN2019 0 3 123456 2 31JUL2019 June 1 4 123456 2 31AUG2019 1 5 123456 1 30SEP2019 1 6 123456 1 31OCT2019 1
It also looks like in that report you want the STAGE variable transposed into columns.
But what is that other variable with values between 0 and 1 that also appears to have been transposed?
Hi @Tom
I have input data something like below: ID Stage r_date Rate
Now my logic worked fine for the stage when ever stage move from 1 to 2 or 2 to 3 then that Month should be populated
on Month column and Rate of that particular month should also be populated in a New column 'REVISED' finally if Overlay should be calculated as REVISED - last month rate(October rate)
ID Stage r_date Rate
123456 1 31May2019 0.98
123456 1 30Jun2019 0.23
123456 2 31Jul2019 0.76
123456 2 31Aug2019 0.54
123456 1 30Sep2019 0.38
123456 1 31Oct2019 0.36
data R_MONTH; /*SET _input1;*/ array mth(6) May June July August September October; /*Now i have validated each and every value it is
perfectly matching with the existing excel results, my concern is how do i change the Month names in this
array going forward */ if mth(6) = 3 then Month = " "; else if mth(6) = 2 then do; do i = 1 to 5; if mth(i) = 3 then do; Month = i; leave; end; end; end; else; if mth(6) = 1 then do; do i = 1 to 5; if mth(i) = 3 then do; Month = i; leave; end; end; if Month = " " then do; do i = 1 to 5; if mth(i) = 2 then do; Month = i; leave; end; end; end; end; if Month NE " " then NEW_MONTH = vname(mth(Month)); run;
So you are getting a little closer to explaining your problem and desired result.
It sounds like the "month" you are trying to pick is the one where the stage first increases ? So from 1 to 2 or from 2 to 3. So perhaps you are trying to find the time of progression of a disease? You might want to set that variable to the same value for every observation in the ID group instead of only storing it one of the six monthly observations.
Or do you want the month where it last increases? Or is it possible that one ID could have more than one of the six months picked?
It is not at all clear what the other two new variables represent. Can you please explain how they are derived? Or if they are on the input then show their values. Is it similar to the MONTH variable in that one value is calculated for the whole six month interval? Or do they have different values on each month?
You provided input data, but you left out the example result. Please post the expected values for the new variable(s).
Something like this with the ? replaced with the values you expect for this given input.
ID Stage r_date Rate Month Revised Overlay 123456 1 31May2019 0.98 ? ? ? 123456 1 30Jun2019 0.23 ? ? ? 123456 2 31Jul2019 0.76 ? ? ? 123456 2 31Aug2019 0.54 ? ? ? 123456 1 30Sep2019 0.38 ? ? ? 123456 1 31Oct2019 0.36 ? ? ?
Note that making a report of the values does not require first converting the data into a wide format.
proc report nofs data=have ;
column id stage,r_date rate,r_date=r_date2;
define id / group ;
define r_date / across format=monname. ' ';
define r_date2 / across format=monname. ' ';
define stage / sum 'Stage';
define rate / sum 'Rate';
run;
Input
ID Stage r_date Rate 123456 1 31May2019 0.76 123456 1 30Jun2019 5.11 123456 2 31Jul2019 0.75 123456 2 31Aug2019 0.74 123456 1 30Sep2019 0.74 123456 1 31Oct2019 0.59 123456 1 31May2019 0.76 123456 1 30Jun2019 0.11 123456 1 31Jul2019 0.75 123456 1 31Aug2019 0.74 123456 1 30Sep2019 0.74 123456 1 31Oct2019 0.59
Expected Output
ID May June July August September October Month May June July August September October Revised Overlay 123456 1 1 2 2 1 1 June 0.76 5.11 0.75 0.74 0.74 0.59 5.11 4.53 123457 1 1 1 1 1 1 0.76 0.11 0.75 0.74 0.74 0.59
Revised is populated when Month name is populated ID 123456 is an example, where as if Month is blank then Revised should be blank
Overlay should be calculated only when Revised is not blank and Overlay = Revised - October(last month)
I cannot tell from your output what values are in which columns. In that type of free flowing listing please include periods to mark the missing values. Also please make use of the Insert Code icon on the menu bar (looks like {i}) so that the text you paste is not reflowed by the forum into paragraphs. I have edited your post to do that, but it looks like it is still not possible to see what values belong in what columns.
Revised is populated when Month name is populated
What are the rules for populating MONTH name? Be explicit. Do not repost that if/then spaghetti code. Of if you must then include comments that explain what the condition being tested means.
Show examples. Especially for boundary conditions.
What is REVISED populated WITH? What are the rules? Explain what you want.
Overlay = Revised - October(last month)
So it looks like OVERLAY should be calculated on just the last observation? What value does it have when MONTH name is not populated? Does it match RATE for that month? Is it missing?
Is this what you are trying to do?
data have;
input ID $ Stage r_date :date. Rate;
format r_date date9.;
cards;
123456 1 31May2019 0.76
123456 1 30Jun2019 5.11
123456 2 31Jul2019 0.75
123456 2 31Aug2019 0.74
123456 1 30Sep2019 0.74
123456 1 31Oct2019 0.59
789012 1 31May2019 0.76
789012 1 30Jun2019 0.11
789012 1 31Jul2019 0.75
789012 1 31Aug2019 0.74
789012 1 30Sep2019 0.74
789012 1 31Oct2019 0.59
;
data want;
do until (last.id);
set have ;
by id ;
lag_stage=lag(stage);
lag_rate=lag(rate);
lag_date=lag(r_date);
if not first.id and missing(revised) and lag_stage < stage then do;
revised=lag_rate;
month = put(lag_date,monname10.-l);
end;
end;
do until (last.id);
set have ;
by id ;
if last.id and not missing(revised) then overlay=revised-rate;
output;
end;
drop lag_: ;
run;
proc print data=want;
run;
Obs ID Stage r_date Rate revised month overlay 1 123456 1 31MAY2019 0.76 5.11 June . 2 123456 1 30JUN2019 5.11 5.11 June . 3 123456 2 31JUL2019 0.75 5.11 June . 4 123456 2 31AUG2019 0.74 5.11 June . 5 123456 1 30SEP2019 0.74 5.11 June . 6 123456 1 31OCT2019 0.59 5.11 June 4.52 7 789012 1 31MAY2019 0.76 . . 8 789012 1 30JUN2019 0.11 . . 9 789012 1 31JUL2019 0.75 . . 10 789012 1 31AUG2019 0.74 . . 11 789012 1 30SEP2019 0.74 . . 12 789012 1 31OCT2019 0.59 . .
@kishoresamineni wrote:
Hi @Tom
I am trying to achieve something like this: I didn't included ID in below image
May June July August September October Month May June July August September October Revised Provision Overlay 1 1 1 1 1 1 0.35 0.35 0.34 0.34 0.33 0.24 - 1 1 1 1 1 1 0.06 0.06 0.06 0.06 0.05 0.05 - 1 1 1 1 1 1 0.05 0.05 0.05 0.05 0.05 0.04 - 1 1 1 1 1 1 0.04 0.04 0.07 0.07 0.07 0.07 - 1 1 1 1 1 1 0.00 0.00 0.00 0.00 0.00 0.00 - 1 1 1 1 1 1 0.26 0.26 0.25 0.25 0.24 0.24 - 1 1 1 1 1 1 0.11 0.11 0.10 0.10 0.10 0.14 - 1 1 1 1 1 1 0.33 0.33 0.32 0.43 0.43 0.42 - 1 1 1 1 1 1 0.24 0.24 0.35 0.47 0.46 0.46 - 1 1 1 1 1 1 0.03 0.03 0.03 0.03 0.03 0.03 - 1 2 1 1 1 1 June 0.76 5.11 0.75 0.74 0.74 0.59 5.11 4.53
We still don't have a clear explanation, and you haven't explained why one column named MAY has integers in it and another column named MAY has numbers with decimals in it.
If the goal is to get a table, where the columns of the table are named with actual MONTH names, this can be done EASILY via PROC REPORT.
If the goal is to create a SAS data set with month names as variable names, then again I recommend re-designing the logic and workflow, as this will cause all sorts of programming problems.
So again, we need a very clear explanation of the data you have and the end result you are trying to achieve.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.