BookmarkSubscribeRSS Feed
coba
Calcite | Level 5

I hope you could help me

I have this type of data

date type value type 2 name

01-Jan-11 a1 10 1 ae

01-Feb-11 a1 10 1 ae

01-Mar-11 a1 10 2 ae

01-Apr-11 a1 10 2 ae

01-Jan-11 A2 5 1 at

01-Feb-11 A2 50 1 at

01-Mar-11 A2 5 2 at

01-Apr-11 A2 50 2 at

01-Jan-11 a3 20 1 au

01-Feb-11 a3 60 1 au

01-Mar-11 a3 20 2 au

01-Apr-11 a3 60 2 au

and I would like to have the cumulative values by type type2 by name.

What is the best method to do this?

Thank you for helping me out.

16 REPLIES 16
art297
Opal | Level 21

What do you want your output to look like?  If you just want the cumulative sum of value, as they accumulate, you can do that in a datastep, using retain or a sum in a form that automatically retains to keep the sum, and a by statement.

But, what you will need will depend upon what you want to end up with.

Does the following approximate what you want to do?:

data have;

  informat date anydtdte9.;

  format date date9.;

  input date type $ value type2 name $;

  cards;

01-Jan-11 a1 10 1 ae

01-Feb-11 a1 10 1 ae

01-Mar-11 a1 10 2 ae

01-Apr-11 a1 10 2 ae

01-Jan-11 A2 5 1 at

01-Feb-11 A2 50 1 at

01-Mar-11 A2 5 2 at

01-Apr-11 A2 50 2 at

01-Jan-11 a3 20 1 au

01-Feb-11 a3 60 1 au

01-Mar-11 a3 20 2 au

01-Apr-11 a3 60 2 au

;

proc sort data=have;

  by type type2 name date;

run;

data want;

  set have;

  by type type2 name;

  if first.name then cumulative=0;

  cumulative+value;

run;

FriedEgg
SAS Employee

data have;

format date eurdfde9.;

input date eurdfde9. type $ value type2 name $;

cards;

01-Jan-11 a1 10 1 ae

01-Feb-11 a1 10 1 ae

01-Mar-11 a1 10 2 ae

01-Apr-11 a1 10 2 ae

01-Jan-11 A2 5 1 at

01-Feb-11 A2 50 1 at

01-Mar-11 A2 5 2 at

01-Apr-11 A2 50 2 at

01-Jan-11 a3 20 1 au

01-Feb-11 a3 60 1 au

01-Mar-11 a3 20 2 au

01-Apr-11 a3 60 2 au

;

run;

proc sql;

select type, type2, name, sum(value) as total_value

   from have

  group by type, type2, name;

quit;

                                             type         type2  name      total_value

                                             -----------------------------------------

                                             A2               1  at                 55

                                             A2               2  at                 55

                                             a1               1  ae                 20

                                             a1               2  ae                 20

                                             a3               1  au                 80

                                             a3               2  au                 80

coba
Calcite | Level 5

I tried this, but how do I manage when the data does not start in Jan but e.g. in Feb. I always would like to have Jan, Jan+Feb, Jan+Feb+Mar ,etc (even if there is a cero.

data work.cum;

set work.data;

by type2 code1 type date;

if first.type then cumwght=0;

cumval + value;

output;

run;

example data: data:

type 1   name1  value     date       type2

E              3              40           01-Jan-11             A1

E              3              10           01-Feb-11            A1

E              3              49           01-Mar-11           A1

E              3              8              01-Apr-11            A1

E              3              56           01-May-11          A1

E              3              33           01-Jun-11            A1

E              3              60           01-Jul-11              A1

E              3              700         01-Feb-11            a2

E              3              145         01-Jan-11             a2

E              3              190         01-Feb-11            a2

E              3              100         01-Mar-11           a2

E              3              80           01-Apr-11            a2

art297
Opal | Level 21

Your data fields don't match your code and, since you didn't show the result that you want, we can only guess.

My guess is that you don't want to include date as a by variable but, rather, year, but with the data having been sorted by date.

That way, it would be irrelevant which month any of your data really begin with.  And, why do you initialize cumwght, but then use cumval in your sum statement?

FriedEgg
SAS Employee

proc format;

value myfmt (multilabel)

  1 = 'Jan'

  2 = 'Feb'

  3 = 'Mar'

  4 = 'Apr'

  5 = 'May'

  6 = 'Jun'

  7 = 'Jul'

  8 = 'Aug'

  9 = 'Sep'

10 = 'Oct'

11 = 'Nov'

12 = 'Dec'

  1 , 2 = 'Jan + Feb'

  1 , 2 , 3 = 'Jan + Feb + Mar'

  /* etc... */

;

run;

data have;

input type1 $ name1 value date eurdfde9. type2 $;

month=month(date);

year=year(date);

cards;

E 3  40 01-Jan-11 A1

E 3  10 01-Feb-11 A1

E 3  49 01-Mar-11 A1

E 3   8 01-Apr-11 A1

E 3  56 01-May-11 A1

E 3  33 01-Jun-11 A1

E 3  60 01-Jul-11 A1

E 3 700 01-Feb-11 a2

E 3 145 01-Jan-11 a2

E 3 190 01-Feb-11 a2

E 3 100 01-Mar-11 a2

E 3  80 01-Apr-11 a2

;

run;

proc means data=have sum nonobs  nway noprint;

class year;

class month /mlf;

class type1 type2;

var value;

format month myfmt.;

output out=want sum=total_value;

run;

proc print data=want; run;

                                                                                                                                total_

                           Obs    year    month              type1    type2    _TYPE_    _FREQ_     value

                             1    2011    Apr                  E       A1        15         1          8

                             2    2011    Apr                  E       a2        15         1         80

                             3    2011    Feb                  E       A1        15         1         10

                             4    2011    Feb                  E       a2        15         2        890

                             5    2011    Jan                  E       A1        15         1         40

                             6    2011    Jan                  E       a2        15         1        145

                             7    2011    Jan + Feb            E       A1        15         2         50

                             8    2011    Jan + Feb            E       a2        15         3       1035

                             9    2011    Jan + Feb + Mar      E       A1        15         3         99

                            10    2011    Jan + Feb + Mar      E       a2        15         4       1135

                            11    2011    Jul                  E       A1        15         1         60

                            12    2011    Jun                  E       A1        15         1         33

                            13    2011    Mar                  E       A1        15         1         49

                            14    2011    Mar                  E       a2        15         1        100

                            15    2011    May                  E       A1        15         1         56

Ksharp
Super User

How about:

data have;
input type1 $ name1 value date eurdfde9. type2 $;
format date date9.;
cards;
E 3  40 01-Jan-11 A1
E 3  10 01-Feb-11 A1
E 3  49 01-Mar-11 A1
E 3   8 01-Apr-11 A1
E 3  56 01-May-11 A1
E 3  33 01-Jun-11 A1
E 3  60 01-Jul-11 A1
E 3 700 01-Feb-11 a2
E 3 145 01-Jan-11 a2
E 3 190 01-Feb-11 a2
E 3 100 01-Mar-11 a2
E 3  80 01-Apr-11 a2
;
run;
proc sql noprint;
 create table temp as 
  select * 
   from (select distinct * from have(keep=type1 type2 name1) ) ,
        (select distinct date from have)
  order by type1,type2,name1,date;
quit;
proc sort data=have;
 by type1 type2 name1 date;
run;
data want;
 merge have temp;
 by type1 type2 name1 date;
 output;
 call missing(value);
run;
data want(drop=value);
 set want;
 by type1 type2 name1 date;
 if first.name1 then sum_value=0;
 sum_value+value;
 if last.date then output;
run;

Ksharp

Message was edited by: xia keshan

coba
Calcite | Level 5

Thank you for your help. I tried this but get this response

NOTE: The execution of this query involves performing one or more Cartesian product joins

that can not be optimized.

NOTE: Table WORK.TEMP created, with 14 rows and 4 columns.

But in my real data, what I need is that if Jan11 there is data, and in Feb11 no data and again in Mar11 is data again.

Jan value is 10, feb no value, Mar value is 20.

What I would like to see is

Jan 10, Feb 10 and Mar 20

I know there is a solution, but I can't find it yet.

Thanks, I really appreciate this forum, it really helps people that are less skilled in SAS.

art297
Opal | Level 21

Did you look at the results of Ksharp's code?  It appears to be doing what you have asked for.

The messageabout the cartesian product is NOT an error.  He simply used the capability of proc sql to provide all possible combinations.

coba
Calcite | Level 5

I looked at the ersults of Ksharp, but when I applied to my real data, the missing months were given a "0" instead of taking the previous month.

Let's say there was a value in Jan11, then in Mar11, nothing in Feb11. So what I would like to see is Feb11 taking the value of Jan11, instead of a cero.

Is this possible with SAS?

art297
Opal | Level 21

That is precisely what his code does!  I don't believe you ever answered my question from a few days ago.  Your initial example code used different fields than your data actually had.  Specifically, a field called code1.

If your codes match the ones in Ksharps program, I think that you'll find that the results are what you expect.

Ksharp
Super User

Can you post some more data, So I can check where the problem is.

Thanks Art.

Ksharp

coba
Calcite | Level 5

Please see an data example below:

and thank you for your time in helping me out. As you can see the first line is Jan11 and the second is Mar11. So I would like the data to appear:

Jan11 is 458, Feb11 is 458, Mar11 is 950 (458+492), etc. until the end of  the code.

type 1 type 2 value date type 3

A 1 458 01-Jan-11 ba

A 1 492 01-Mar-11 ba

A 1 9 01-Apr-11 ba

A 1 1 01-May-11 ba

A 1 333 01-Jun-11 ba

A 1 1 01-Jul-11 ba

A 2 14 01-Jan-11 ba

A 2 20 01-Feb-11 ba

A 2 18 01-Mar-11 ba

A 2 13 01-Apr-11 ba

A 2 4605 01-May-11 ba

A 2 1599 01-Jun-11 ba

A 2 58 01-Jul-11 ba

A 3 7 01-Jan-11 ba

A 3 3 01-Feb-11 ba

A 3 4 01-Mar-11 ba

A 3 3 01-Apr-11 ba

A 3 4 01-May-11 ba

A 3 2 01-Jun-11 ba

B 1 5 01-Feb-11 ba

B 1 2 01-Mar-11 ba

B 1 7 01-Apr-11 ba

B 1 1 01-May-11 ba

B 1 4 01-Jul-11 ba

B 1 20 01-Mar-11 ba

B 2 11 01-Apr-11 ba

B 2 7 01-Jun-11 ba

B 2 70 01-May-11 ba

B 2 1 01-Jan-11 ba

B 2 1 01-Feb-11 ba

B 2 1 01-Mar-11 ba

B 2 1 01-Apr-11 ba

B 3 1 01-May-11 ba

B 3 1 01-Jun-11 ba

B 3 1 01-Jul-11 ba

B 3 1 01-Jan-11 ba

B 3 1 01-Apr-11 ba

Ksharp
Super User

You change the variable name.

How about this:

data have;
input type1 $ type2 value date date9. type3 $;
format date date9.;
cards;
A 1 458 01-Jan-11 ba
A 1 492 01-Mar-11 ba
A 1 9 01-Apr-11 ba
A 1 1 01-May-11 ba
A 1 333 01-Jun-11 ba
A 1 1 01-Jul-11 ba
A 2 14 01-Jan-11 ba
A 2 20 01-Feb-11 ba
A 2 18 01-Mar-11 ba
A 2 13 01-Apr-11 ba
A 2 4605 01-May-11 ba
A 2 1599 01-Jun-11 ba
A 2 58 01-Jul-11 ba
A 3 7 01-Jan-11 ba
A 3 3 01-Feb-11 ba
A 3 4 01-Mar-11 ba
A 3 3 01-Apr-11 ba
A 3 4 01-May-11 ba
A 3 2 01-Jun-11 ba
B 1 5 01-Feb-11 ba
B 1 2 01-Mar-11 ba
B 1 7 01-Apr-11 ba
B 1 1 01-May-11 ba
B 1 4 01-Jul-11 ba
B 1 20 01-Mar-11 ba
B 2 11 01-Apr-11 ba
B 2 7 01-Jun-11 ba
B 2 70 01-May-11 ba
B 2 1 01-Jan-11 ba
B 2 1 01-Feb-11 ba
B 2 1 01-Mar-11 ba
B 2 1 01-Apr-11 ba
B 3 1 01-May-11 ba
B 3 1 01-Jun-11 ba
B 3 1 01-Jul-11 ba
B 3 1 01-Jan-11 ba
B 3 1 01-Apr-11 ba
;
run;
proc sql noprint;
 create table temp as 
  select * 
   from (select distinct * from have(keep=type1 type2 type3) ) ,
        (select distinct date from have)
  order by type1,type2,type3,date;
quit;
proc sort data=have;
 by type1 type2 type3 date;
run;
data want;
 merge have temp;
 by type1 type2 type3 date;
 output;
 call missing(value);
run;
data want(drop=value);
 set want;
 by type1 type2 type3 date;
 if first.type3 then sum_value=0;
 sum_value+value;
 if last.date then output;
run;

Ksharp

art297
Opal | Level 21

Given that all of the type3s have the same value, can we assume that it is really either irrelevant or has the highest priority.  The following revision of Ksharp's code produces the desired result:

proc sql noprint;

create table temp as

  select *

   from (select distinct * from have(keep=type3 type1 type2) ) ,

        (select distinct date from have)

  order by type3,type1,type2,date;

quit;

proc sort data=have;

by type3 type1 type2 date;

run;

data want;

merge have temp;

by type3 type1 type2 date;

output;

call missing(value);

run;

data want(drop=value);

set want;

by type3 type1 type2 date;

if first.type2 then sum_value=0;

sum_value+value;

if last.date then output;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 3792 views
  • 0 likes
  • 4 in conversation