BookmarkSubscribeRSS Feed
kishoresamineni
Quartz | Level 8
data work.r_six_month (keep=ID Stage r_date);
set r_prod_t;
where r_date between "&Month_back6."d and "&last_month"d and latest_run_in_mth = 'Y';
run;


PROC SQL;
CREATE TABLE WORK.r_month_fmt AS 
SELECT ID, 
       Stage, 
       r_date, 
	   r_date FORMAT=MONNAME9. AS Month
      FROM WORK.R_SIX_MONTH;
QUIT;

proc sort data=r_month_fmt;
by ID;
run;


proc transpose data=r_month_fmt out=r_trans_month(drop= _name_);
by ID;
var Stage;
id Month;
run;

PROC SQL;
   CREATE TABLE WORK.r_t_format_month AS 
   SELECT t1.ID, 
          /* May_ */
            (put(t1.'      May'n,z1.)) AS May, 
          /* June_ */
            (put(t1.'     June'n,z1.)) AS June, 
          /* July_ */
            (put(t1.'     July'n,z1.)) AS July, 
          /* August_ */
            (put(t1.'   August'n,z1.)) AS August, 
          /* September_ */
            (put(t1.September,z1.)) AS September, 
          /* October_ */
            (put(t1.'  October'n,z1.)) AS October
      FROM WORK.r_trans_month t1;
QUIT;


Hi,

 

below piece of code i want to make dynamic. After transpose from above step Month is converted to numeric 

i want to make a dynamic format conversion ... because every month i pull the data for last six month. I don't want 

to change the program and formats manually every month.

 

PROC SQL;

CREATE TABLE WORK.r_t_format_month AS

SELECT t1.ID,

/* May_ */

(put(t1.' May'n,z1.)) AS May,

/* June_ */

(put(t1.' June'n,z1.)) AS June,

/* July_ */

(put(t1.' July'n,z1.)) AS July,

/* August_ */

(put(t1.' August'n,z1.)) AS August,

/* September_ */

(put(t1.September,z1.)) AS September,

/* October_ */

(put(t1.' October'n,z1.)) AS October

FROM WORK.r_trans_month t1;

QUIT;

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

I'm not 100% sure what you are doing, but there are plenty of built-in formats in SAS that don't require you to re-code the list of months.

 

For example, the MONNAME. format may be what you want (although as I said I'm not 100% sure I know what you want).

https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=leforinforref&docsetTarget=n0...

--
Paige Miller
Tom
Super User Tom
Super User

Is your sorting/grouping variable named ID or PRIMARY_ACCOUNT_NO?

Do you need all of those intermediate datasets?

Do you even need the final output as a dataset?  Having a dataset with data (month name) in metadata (variable name) is normally a bad idea.  You might be able to produce the report you need directly from the data without transposing.

 

Can you show some example values of the input data for one or two key values and two or three months? Preferable in the form of a data step that reads from in-line data.

kishoresamineni
Quartz | Level 8

Hi,

 

Sorry my mistake it's ID

 

in my initial data as

ID Stage r_date

123456 1 31May2019

123456 1 30Jun2019

123456 2 31Jul2019

123456 2 31Aug2019

123456 1 30Sep2019

123456 1 31Oct2019

 

Post transpose MONTH column names has been converted to numeric format

ID May June July August September October

123456 1 1 2 2 1 1

 

I did this

PROC SQL;
CREATE TABLE WORK._INPUT1 AS
SELECT t1.ID,
/* May_ */
(put(t1.' May'n,z1.)) AS May,
/* June_ */
(put(t1.' June'n,z1.)) AS June,
/* July_ */
(put(t1.' July'n,z1.)) AS July,
/* August_ */
(put(t1.' August'n,z1.)) AS August,
/* September_ */
(put(t1.September,z1.)) AS September,
/* October_ */
(put(t1.' October'n,z1.)) AS October
FROM WORK.INPUT1 t1;
QUIT;

 

to execute below step:

 

data R_MONTH;
SET input1;
array mth(6) May June July August September October; /*how can i eliminate mentioning Month names going forward - any way automatically to pick last six month */

if mth(6) = 3 then
Month = " ";
else if mth(6) = 2 then
do;
do i = 1 to 5;
if mth(i) = 3 then
do; Month = i; leave; end;
end;
end;
else;
if mth(6) = 1 then
do;
do i = 1 to 5;
if mth(i) = 3 then
do;
Month = i; leave; end;
end;
if Month = " " then
do;
do i = 1 to 5;
if mth(i) = 2 then
do;
Month = i; leave; end;
end;
end;
end;
if Month NE " " then
NEW_MONTH = vname(mth(Month));
run;

PaigeMiller
Diamond | Level 26

As pointed out by @Tom, your logic and workflow seems to be in need of improvement. Creating columns with month names will cause all sorts of programming problems.  Better to design a workflow and logic where you are not creating columns with month names, and then these programming issues go away. But it's not clear what you are doing and where you want to go. Can you describe in words, without code, and also by showing us examples, the original data and then the desired final output?

--
Paige Miller
kishoresamineni
Quartz | Level 8

Hi @Tom 

 

I am trying to achieve something like this: I didn't included ID in below image

 

 

May June July August  September   October  Month  May   June   July   August   September   October   Revised Provision   Overlay 
1 1 1 1 1 1                   0.35               0.35               0.34               0.34               0.33               0.24                        -  
1 1 1 1 1 1                   0.06               0.06               0.06               0.06               0.05               0.05                        -  
1 1 1 1 1 1                   0.05               0.05               0.05               0.05               0.05               0.04                        -  
1 1 1 1 1 1                   0.04               0.04               0.07               0.07               0.07               0.07                        -  
1 1 1 1 1 1                   0.00               0.00               0.00               0.00               0.00               0.00                        -  
1 1 1 1 1 1                   0.26               0.26               0.25               0.25               0.24               0.24                        -  
1 1 1 1 1 1                   0.11               0.11               0.10               0.10               0.10               0.14                        -  
1 1 1 1 1 1                   0.33               0.33               0.32               0.43               0.43               0.42                        -  
1 1 1 1 1 1                   0.24               0.24               0.35               0.47               0.46               0.46                        -  
1 1 1 1 1 1                   0.03               0.03               0.03               0.03               0.03               0.03                        -  
1 2 1 1 1 1 June                 0.76               5.11               0.75               0.74               0.74               0.59                5.11                 4.53
Tom
Super User Tom
Super User

This appears to be a continuation of another request to tag when state changes where the rules were not clearly explained. Was that your question also?  From this example it looks like you want to set MONTH to the name of the month where the stage variable fist become 2.

So something like this:

data have ;
  input ID $ Stage r_date :date9.;
  format r_date date9.;
cards;
123456 1 31May2019
123456 1 30Jun2019
123456 2 31Jul2019
123456 2 31Aug2019
123456 1 30Sep2019
123456 1 31Oct2019
;

data step1;
  set have;
  by id;
  length month $10 ;
  retain found;
  if first.id then found=0;
  if stage=2 and not found then do;
    month=put(intnx('month',r_date,-1),monname.-l);
    found=1;
  end;
run;
Obs      ID      Stage       r_date    month    found

 1     123456      1      31MAY2019               0
 2     123456      1      30JUN2019               0
 3     123456      2      31JUL2019    June       1
 4     123456      2      31AUG2019               1
 5     123456      1      30SEP2019               1
 6     123456      1      31OCT2019               1

It also looks like in that report you want the STAGE variable transposed into columns.

But what is that other variable with values between 0 and 1 that also appears to have been transposed?

kishoresamineni
Quartz | Level 8

 

Hi @Tom 

 

I have input data something like below: ID Stage r_date Rate

Now my logic worked fine for the stage when ever stage move from 1 to 2 or 2 to 3 then that Month should be populated 

on Month column and Rate of that particular month should also be populated in a New column 'REVISED' finally if Overlay should be calculated as REVISED - last month rate(October rate)

 

ID Stage r_date Rate

123456 1 31May2019 0.98
123456 1 30Jun2019 0.23
123456 2 31Jul2019 0.76
123456 2 31Aug2019 0.54
123456 1 30Sep2019 0.38
123456 1 31Oct2019 0.36

 

data R_MONTH;
/*SET _input1;*/
array mth(6) May June July August September October; /*Now i have validated each and every value it is
perfectly matching with the existing excel results, my concern is how do i change the Month names in this
array going forward */ if mth(6) = 3 then Month = " "; else if mth(6) = 2 then do; do i = 1 to 5; if mth(i) = 3 then do; Month = i; leave; end; end; end; else; if mth(6) = 1 then do; do i = 1 to 5; if mth(i) = 3 then do; Month = i; leave; end; end; if Month = " " then do; do i = 1 to 5; if mth(i) = 2 then do; Month = i; leave; end; end; end; end; if Month NE " " then NEW_MONTH = vname(mth(Month)); run;
Tom
Super User Tom
Super User

So you are getting a little closer to explaining your problem and desired result.

It sounds like the "month" you are trying to pick is the one where the stage first increases ? So from 1 to 2 or from 2 to 3.  So perhaps you are trying to find the time of progression of a disease?    You might want to set that variable to the same value for every observation in the ID group instead of only storing it one of the six monthly observations. 

 

Or do you want the month where it last increases? Or is it possible that one ID could have more than one of the six months picked? 

 

It is not at all clear what the other two new variables represent.  Can you please explain how they are derived? Or if they are on the input then show their values.  Is it similar to the MONTH variable in that one value is calculated for the whole six month interval?  Or do they have different values on each month?

 

You provided input data, but you left out the example result.  Please post the expected values for the new variable(s).

Something like this with the ? replaced with the values you expect for this given input.

ID Stage r_date Rate Month Revised Overlay
123456 1 31May2019 0.98 ? ? ?
123456 1 30Jun2019 0.23 ? ? ?
123456 2 31Jul2019 0.76 ? ? ?
123456 2 31Aug2019 0.54 ? ? ?
123456 1 30Sep2019 0.38 ? ? ?
123456 1 31Oct2019 0.36 ? ? ?

 

 

Tom
Super User Tom
Super User

Note that making a report of the values does not require first converting the data into a wide format.

proc report nofs data=have ;
column id stage,r_date rate,r_date=r_date2;
define id / group ;
define r_date / across  format=monname. ' ';
define r_date2 / across format=monname. ' ';
define stage / sum 'Stage';
define rate / sum 'Rate';
run;

image.png

kishoresamineni
Quartz | Level 8

Input

ID Stage r_date Rate
123456 1 31May2019 0.76
123456 1 30Jun2019 5.11
123456 2 31Jul2019 0.75
123456 2 31Aug2019 0.74
123456 1 30Sep2019 0.74
123456 1 31Oct2019 0.59
123456 1 31May2019 0.76
123456 1 30Jun2019 0.11
123456 1 31Jul2019 0.75
123456 1 31Aug2019 0.74
123456 1 30Sep2019 0.74
123456 1 31Oct2019 0.59

Expected Output

ID May June July August September October Month May June July August September October Revised Overlay
123456 1 1 2 2 1 1 June 0.76 5.11 0.75 0.74 0.74 0.59 5.11 4.53
123457 1 1 1 1 1 1 0.76 0.11 0.75 0.74 0.74 0.59

Revised is populated when Month name is populated ID 123456 is an example, where as if Month is blank then Revised should be blank
Overlay should be calculated only when Revised is not blank and Overlay = Revised - October(last month)

Tom
Super User Tom
Super User

I cannot tell from your output what values are in which columns.  In that type of free flowing listing please include periods to mark the missing values. Also please make use of the Insert Code icon on the menu bar (looks like {i}) so that the text you paste is not reflowed by the forum into paragraphs.  I have edited your post to do that, but it looks like it is still not possible to see what values belong in what columns.

Revised is populated when Month name is populated

What are the rules for populating MONTH name? Be explicit. Do not repost that if/then spaghetti code.  Of if you must then include comments that explain what the condition being tested means.

Show examples. Especially for boundary conditions. 

  • What if STAGE is one for every observation? What if it 2 or 3 for every month?
  • What if the first month has STAGE=2?  How about if the first month has STAGE=3?
  • What if stage goes to 2 on 2nd month and then to 3 on the 4th month?

What is REVISED populated WITH? What are the rules? Explain what you want.

Overlay = Revised - October(last month)

So it looks like OVERLAY should be calculated on just the last observation? What value does it have when MONTH name is not populated?  Does it match RATE for that month?  Is it missing?

 

Is this what you are trying to do?

data have;
  input ID $ Stage r_date :date. Rate;
  format r_date date9.;
cards;
123456 1 31May2019 0.76
123456 1 30Jun2019 5.11
123456 2 31Jul2019 0.75
123456 2 31Aug2019 0.74
123456 1 30Sep2019 0.74
123456 1 31Oct2019 0.59
789012 1 31May2019 0.76
789012 1 30Jun2019 0.11
789012 1 31Jul2019 0.75
789012 1 31Aug2019 0.74
789012 1 30Sep2019 0.74
789012 1 31Oct2019 0.59
;

data want;
  do until (last.id);
    set have ;
    by id ;
    lag_stage=lag(stage);
    lag_rate=lag(rate);
    lag_date=lag(r_date);
    if not first.id and missing(revised) and lag_stage < stage then do;
       revised=lag_rate;
       month = put(lag_date,monname10.-l);
    end;
  end;
  do until (last.id);
    set have ;
    by id ;
    if last.id and not missing(revised) then overlay=revised-rate;
    output;
  end;
  drop lag_: ;
run;

proc print data=want;
run;
Obs      ID      Stage       r_date    Rate    revised    month    overlay

  1    123456      1      31MAY2019    0.76      5.11     June        .
  2    123456      1      30JUN2019    5.11      5.11     June        .
  3    123456      2      31JUL2019    0.75      5.11     June        .
  4    123456      2      31AUG2019    0.74      5.11     June        .
  5    123456      1      30SEP2019    0.74      5.11     June        .
  6    123456      1      31OCT2019    0.59      5.11     June       4.52
  7    789012      1      31MAY2019    0.76       .                   .
  8    789012      1      30JUN2019    0.11       .                   .
  9    789012      1      31JUL2019    0.75       .                   .
 10    789012      1      31AUG2019    0.74       .                   .
 11    789012      1      30SEP2019    0.74       .                   .
 12    789012      1      31OCT2019    0.59       .                   .

PaigeMiller
Diamond | Level 26

@kishoresamineni wrote:

Hi @Tom 

 

I am trying to achieve something like this: I didn't included ID in below image

 

 

May June July August  September   October  Month  May   June   July   August   September   October   Revised Provision   Overlay 
1 1 1 1 1 1                   0.35               0.35               0.34               0.34               0.33               0.24                        -  
1 1 1 1 1 1                   0.06               0.06               0.06               0.06               0.05               0.05                        -  
1 1 1 1 1 1                   0.05               0.05               0.05               0.05               0.05               0.04                        -  
1 1 1 1 1 1                   0.04               0.04               0.07               0.07               0.07               0.07                        -  
1 1 1 1 1 1                   0.00               0.00               0.00               0.00               0.00               0.00                        -  
1 1 1 1 1 1                   0.26               0.26               0.25               0.25               0.24               0.24                        -  
1 1 1 1 1 1                   0.11               0.11               0.10               0.10               0.10               0.14                        -  
1 1 1 1 1 1                   0.33               0.33               0.32               0.43               0.43               0.42                        -  
1 1 1 1 1 1                   0.24               0.24               0.35               0.47               0.46               0.46                        -  
1 1 1 1 1 1                   0.03               0.03               0.03               0.03               0.03               0.03                        -  
1 2 1 1 1 1 June                 0.76               5.11               0.75               0.74               0.74               0.59                5.11                 4.53

We still don't have a clear explanation, and you haven't explained why one column named MAY has integers in it and another column named MAY has numbers with decimals in it.

 

If the goal is to get a table, where the columns of the table are named with actual MONTH names, this can be done EASILY via PROC REPORT.

 

If the goal is to create a SAS data set with month names as variable names, then again I recommend re-designing the logic and workflow, as this will cause all sorts of programming problems.


So again, we need a very clear explanation of the data you have and the end result you are trying to achieve.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 847 views
  • 1 like
  • 3 in conversation