Hi,
ID May June July August September October Month 31-May-19 30-Jun-19 31-Jul-19 31-Aug-19 30-Sep-19 31-Oct-19
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 0.70 0.84 0.76 0.19 0.46 0.89
I need to add two new Columns COST and Actual
ID - Numeric Format
May, June, July, August, September, October, Month - Character Format
31-MAY-2019, 30-JUN-2019, 31-JUL-2019, 31-AUG-2019, 30-SEP-2019, 31-OCT-2019 - Numeric Format
1) Cost : Cost should only be populated when Month is available in MONTH Column
for e.g., if May Month in Month column then 31-MAY-2019 cost should come up in
COST column.
2) Actaul : ACTUAL should be calculated from COST and last month cost(here 31-OCT-2019)
will be last month
Actual Formula : ACTUAL = Cost - 31-OCT-2019
Need output like this:
You suffer from bad data design.
Look at this:
data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;
data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;
proc transpose
data=have1
out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;
data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;
data want;
merge
have1_real (in=h1 rename=(cost=_cost month=_month))
have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;
Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.
See Maxim 33.
The data you posted reminds me of someone else posting data in the same strange format. Weird.
First thing to do: fix the broken variable names like "31-May-19" - this name could be used, but would make reading and maintaining the code harder, so such names must be avoided. And as soon as you have the data as dataset, you should re-post the data as data-step using datalines statement, so that the community has something to actually work with. See https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... for details about posting data.
Hi @vnreddy
I can't agree more with @andreas_lds
However, you can try the following code:
data have;
infile datalines dlm=" " missover dsd;
length month $10.;
input ID May$ June$ July$ August$ September$ October$ Month$ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
datalines;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 Jul 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 0.70 0.84 0.76 0.19 0.46 0.89
;
run;
data want;
set have;
array mon(*) $ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
do i=1 to dim(mon);
if month ne "" and substr(vname(mon(i)),4,3) = substr(month,1,3) then do;
cost = mon(i);
leave;
end;
end;
actual = Cost - "31-Oct-19"n;
run;
proc print data=want;
run;
Please try the below code, a dynamic approach
data have;
input ID May June July August September October Month$ _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
cards;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 xxx 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 xxx 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 xxx 0.70 0.84 0.76 0.19 0.46 0.89
;
data want;
set have;
array dates(*) _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
do i = 1 to dim(dates);
if lowcase(substr(month,1,3))=lowcase(compress(vname(dates(i)),,'ka')) then cost=dates(i);
actual=cost-dates(i);
end;
run;
As mentioned by others, having variable names that look like calendar days or months is not usually a good programming practice.
But also, problems like this are often more easily handled in Excel, rather than SAS. In Excel, you don't have to change the variable names, and the programming relatively quick and straightforward, unlike in SAS. So this is a job for Excel, not SAS.
You suffer from bad data design.
Look at this:
data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;
data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;
proc transpose
data=have1
out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;
data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;
data want;
merge
have1_real (in=h1 rename=(cost=_cost month=_month))
have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;
Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.
See Maxim 33.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.