BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vnreddy
Quartz | Level 8

Hi,

 

samp.JPG

ID May June July August September October Month 31-May-19 30-Jun-19 31-Jul-19 31-Aug-19 30-Sep-19 31-Oct-19
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 0.70 0.84 0.76 0.19 0.46 0.89

 

I need to add two new Columns COST and Actual
ID - Numeric Format
May, June, July, August, September, October, Month - Character Format
31-MAY-2019, 30-JUN-2019, 31-JUL-2019, 31-AUG-2019, 30-SEP-2019, 31-OCT-2019 - Numeric Format

1) Cost : Cost should only be populated when Month is available in MONTH Column
for e.g., if May Month in Month column then 31-MAY-2019 cost should come up in
COST column.

2) Actaul : ACTUAL should be calculated from COST and last month cost(here 31-OCT-2019)
will be last month
Actual Formula : ACTUAL = Cost - 31-OCT-2019

 

Need output like this:

 

samp2.JPG

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You suffer from bad data design.

Look at this:

data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;

data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;

proc transpose
  data=have1
  out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;

data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;

data want;
merge
  have1_real (in=h1 rename=(cost=_cost month=_month))
  have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;

Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.

See Maxim 33.

View solution in original post

5 REPLIES 5
andreas_lds
Jade | Level 19

The data you posted reminds me of someone else posting data in the same strange format. Weird.

 

First thing to do: fix the broken variable names like "31-May-19" - this name could be used, but would make reading and maintaining the code harder, so such names must be avoided. And as soon as you have the data as dataset, you should re-post the data as data-step using datalines statement, so that the community has something to actually work with. See https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... for details about posting data.

ed_sas_member
Meteorite | Level 14

Hi @vnreddy 

 

I can't agree more with @andreas_lds 

 

However, you can try the following code:

data have;
	infile datalines dlm=" " missover dsd;
	length month $10.;
	input ID May$ June$ July$ August$ September$ October$ Month$ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
	datalines;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1  0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2  0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 Jul 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3  0.70 0.84 0.76 0.19 0.46 0.89
;
run;

data want;
	set have;
	array mon(*) $ "31-May-19"n "30-Jun-19"n "31-Jul-19"n "31-Aug-19"n "30-Sep-19"n "31-Oct-19"n;
	do i=1 to dim(mon);
		if month ne "" and substr(vname(mon(i)),4,3) = substr(month,1,3) then do;
			cost = mon(i);
		leave;
		end;
	end;
	actual = Cost - "31-Oct-19"n;
run;

proc print data=want;
run;
Jagadishkatam
Amethyst | Level 16

Please try the below code, a dynamic approach

 

data have;
input ID May June July August September October Month$ _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
cards;
12345 3 2 2 2 2 2 May 0.57 0.63 0.89 2.32 1.63 0.4
12346 2 2 1 1 1 1 May 0.99 0.42 0.9 0.02 0.93 0.25
12347 1 1 1 1 1 1 xxx 0.64 0.93 0.95 0.21 0.75 0.52
12348 3 3 3 1 2 2 May 0.97 0.41 0.28 0.23 0.74 0.1
12349 1 2 2 2 3 2 September 0.83 0.23 0.76 0.56 0.97 0.43
12350 1 2 2 2 2 2 xxx 0.03 0.81 0.65 0.52 0.86 0.25
12351 2 2 2 1 1 1 May 0.73 0.29 0.18 0.60 0.35 0.58
12352 3 3 3 2 2 2 May 0.99 0.43 0.72 0.13 0.37 0.84
12353 3 2 2 2 2 2 May 0.67 0.20 0.36 0.11 0.86 0.51
12354 2 3 3 3 3 3 xxx 0.70 0.84 0.76 0.19 0.46 0.89
;

data want;
set have;
array dates(*) _31_May_19 _30_Jun_19 _31_Jul_19 _31_Aug_19 _30_Sep_19 _31_Oct_19;
do i = 1 to dim(dates);
if lowcase(substr(month,1,3))=lowcase(compress(vname(dates(i)),,'ka')) then cost=dates(i);
actual=cost-dates(i);
end;
run;

 

Thanks,
Jag
PaigeMiller
Diamond | Level 26

As mentioned by others, having variable names that look like calendar days or months is not usually a good programming practice. 

 

But also, problems like this are often more easily handled in Excel, rather than SAS. In Excel, you don't have to change the variable names, and the programming relatively quick and straightforward, unlike in SAS. So this is a job for Excel, not SAS.

--
Paige Miller
Kurt_Bremser
Super User

You suffer from bad data design.

Look at this:

data have1;
input id $ mon_5 mon_6 mon_7 mon_8 mon_9 mon_10;
datalines;
12345 0.57 0.63 0.89 2.32 1.63 0.4
12346 0.99 0.42 0.9 0.02 0.93 0.25
12347 0.64 0.93 0.95 0.21 0.75 0.52
12348 0.97 0.41 0.28 0.23 0.74 0.1
12349 0.83 0.23 0.76 0.56 0.97 0.43
12350 0.03 0.81 0.65 0.52 0.86 0.25
12351 0.73 0.29 0.18 0.60 0.35 0.58
12352 0.99 0.43 0.72 0.13 0.37 0.84
12353 0.67 0.20 0.36 0.11 0.86 0.51
12354 0.70 0.84 0.76 0.19 0.46 0.89
;

data have2;
input id $ month;
datalines;
12345 5
12346 5
12348 5
12349 9
12351 5
12352 5
12353 5
;

proc transpose
  data=have1
  out=howitshouldbe (rename=(col1=cost))
;
by id;
var mon:;
run;

data have1_real;
set howitshouldbe;
month = input(scan(_name_,2,'_'),best.);
drop _name_;
run;

data want;
merge
  have1_real (in=h1 rename=(cost=_cost month=_month))
  have2 (in=h2)
;
by id;
if h1 and h2;
retain cost;
if first.id then cost = .;
if month = _month then cost = _cost;
if last.id;
actual = _cost - cost;
drop _:;
run;

Note that the first few steps are only there to beat your unusable data into shape (long format). The whole real work is done in the last step, which becomes very simple through intelligent design of the data structure.

See Maxim 33.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 816 views
  • 2 likes
  • 6 in conversation