Solved: Cost calculation

vnreddy · Posted 11-27-2019 06:43 AM

Hi,

ID May June July August September October Month 31-May-19 30-Jun-19 31-Jul-19 31-Aug-19 30-Sep-19 31-Oct-19
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 0.70 0.84 0.76 0.19 0.46 0.89

I need to add two new Columns COST and Actual
ID - Numeric Format
May, June, July, August, September, October, Month - Character Format
31-MAY-2019, 30-JUN-2019, 31-JUL-2019, 31-AUG-2019, 30-SEP-2019, 31-OCT-2019 - Numeric Format

1) Cost : Cost should only be populated when Month is available in MONTH Column
for e.g., if May Month in Month column then 31-MAY-2019 cost should come up in
COST column.

2) Actaul : ACTUAL should be calculated from COST and last month cost(here 31-OCT-2019)
will be last month
Actual Formula : ACTUAL = Cost - 31-OCT-2019

Need output like this:

Kurt_Bremser · Posted 11-27-2019 08:43 AM

You suffer from bad data design.

Look at this:

data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;

data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;

proc transpose
  data=have1
  out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;

data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;

data want;
merge
  have1_real (in=h1 rename=(cost=_cost month=_month))
  have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;

Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.

See Maxim 33.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

andreas_lds · Posted 11-27-2019 07:04 AM

The data you posted reminds me of someone else posting data in the same strange format. Weird.

First thing to do: fix the broken variable names like "31-May-19" - this name could be used, but would make reading and maintaining the code harder, so such names must be avoided. And as soon as you have the data as dataset, you should re-post the data as data-step using datalines statement, so that the community has something to actually work with. See https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... for details about posting data.

ed_sas_member · Posted 11-27-2019 07:08 AM

Hi @vnreddy

I can't agree more with @andreas_lds

However, you can try the following code:

data have;
	infile datalines dlm=" " missover dsd;
	length month $10.;
	input ID May$ June$ July$ August$ September$ October$ Month$ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
	datalines;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1  0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2  0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 Jul 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3  0.70 0.84 0.76 0.19 0.46 0.89
;
run;

data want;
	set have;
	array mon(*) $ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
	do i=1 to dim(mon);
		if month ne "" and substr(vname(mon(i)),4,3) = substr(month,1,3) then do;
			cost = mon(i);
		leave;
		end;
	end;
	actual = Cost - "31-Oct-19"n;
run;

proc print data=want;
run;

Jagadishkatam · Posted 11-27-2019 07:09 AM

Please try the below code, a dynamic approach

data have;
input ID May June July August September October Month$ _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
cards;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 xxx 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 xxx 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 xxx 0.70 0.84 0.76 0.19 0.46 0.89
;

data want;
set have;
array dates(*) _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
do i = 1 to dim(dates);
if lowcase(substr(month,1,3))=lowcase(compress(vname(dates(i)),,'ka')) then cost=dates(i);
actual=cost-dates(i);
end;
run;

Thanks,
Jag

PaigeMiller · Posted 11-27-2019 08:08 AM

As mentioned by others, having variable names that look like calendar days or months is not usually a good programming practice.

But also, problems like this are often more easily handled in Excel, rather than SAS. In Excel, you don't have to change the variable names, and the programming relatively quick and straightforward, unlike in SAS. So this is a job for Excel, not SAS.

--
Paige Miller

Kurt_Bremser · Posted 11-27-2019 08:43 AM

You suffer from bad data design.

Look at this:

data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;

data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;

proc transpose
  data=have1
  out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;

data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;

data want;
merge
  have1_real (in=h1 rename=(cost=_cost month=_month))
  have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;

Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.

See Maxim 33.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

Re: Cost calculation

SAS Innovate 2025: Register Now