BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

 

data DB1;
input cat1 cat2 cat3 cat4 cat5 place value;
cards;
0 0 0 0 0 1 125
0 0 0 0 0 2 33
0 0 0 0 0 3 44
0 0 1 0 0 1 2
0 0 1 0 0 2 .
0 0 1 0 0 3 1
0 1 0 0 0 1 .
0 1 0 0 0 2 .
0 1 0 0 0 3 .
0 1 0 1 0 1 65
0 1 0 1 0 2 36
0 1 0 1 0 3 43
0 1 1 0 0 1 .
0 1 1 0 0 2 .
0 1 1 1 0 1 4
0 1 1 1 0 2 2
1 0 0 0 0 1 164
1 0 0 0 0 2 516
1 0 0 0 0 3 619
1 0 1 0 0 1 5
1 0 1 0 0 2 1
1 0 1 0 0 3 2
;

 

Is there a way to generate this desired output? 

data DB2;
input CAT Place Value ;
cards;
0 1 125

0 2 33

0 3 44

1 1 169

1 2 517

1 3 621

2 1 69

2 2 38

2 3 43

3 1 7

3 2 1

3 3 3

4 1 69

4 2 38

4 3 43

5 1 0

5 2 0

5 3 0
;

 

In other words, I would like to sum all values in column value for each cat* by place 1,2,3. It doesn't matter if cat* are not mutually exclusive. Moreover, the output should be presented as a single column of cat stratified by "place". Can you help me please? 

 

Edit: 

looking at the cat5: it has all "0s". As a rule, if all the other cat*(s) in DB1 have 0 (i.e., no "1" index), in DB2 "CAT" will be number "0". This means that cat5 specific values will be 0 because no "1" index is found for that cat* in DB1. Conversely it will be in CAT "0" for values 125, 33 and 44 because all the other cat* are 0 (no "1" index in DB1). Based on this, in DB2 we will see CAT 5 = 0 for place 1,2 and 3 because no specific values are found for cat5 (corresponding to "1" in DB1). For all the other cases, simply a sum of "value" is required for each "CAT*" in DB1 by "place".

Example: let's consider cat1 column in DB1 that will become CAT = 1 in DB2. If we look at all times we see "1"index, for place 1 we will have values in DB1 equal to: 164 and 5. I just need to sum them to get 169. For place 2 we have: 516 and 1 that after the sum will become 517 and for place 3 we have 619 and 2 that after the sum will become 621. This is the output I need. 

Note: the sum must be performed REGARDLESS what happens in the other "cat*" in DB1 except for 0 as explained above.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Okay, I think I understand now. Thanks.

 

This is a brute force solution, but it seems to work.

 

%macro do_this;
    proc delete data=db2; run; quit;
    %do i=1 %to 5;
        proc summary data=db1 nway;
            class place;
            var value;
            weight cat&i;
            output out=out sum=;
        run;
        data out;
            set out;
            cat=&i;
        run;
        proc append base=db2 new=out;
        run;
        proc delete data=out; run; quit;
    %end;
%mend;
%do_this
--
Paige Miller

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26

I don't understand the relationship of CAT in DB2 (which has values 1 to 5) to the data in DB1. Please explain further how CAT=1 in DB2 is derived from DB1. Please explain further how CAT=2 in DB2 is derived from DB1. And so on.

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10

So, basically there are 5 cat* that in DB1 are on columns while in DB2 will be in one column with indexes from 1 to 5 based on the column name cat1, cat2, cat3, cat4, cat5 in DB1

PaigeMiller
Diamond | Level 26

I'm afraid this really doesn't explain the issue. The values for CAT5 are all zeros. How is there a CAT = 5 in the output?

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10

So: there are no patients in category 5 for all the three places (1,2,3) with a value in value column (otherwise you should see "1" in the column cat5)regardless any value in cat*(1-4). Despite this, cat5 must appear in DB2 to say "we cannot find patients in cat5 with a value different from 0 for all the three places". In DB1 categories are listed in columns as: cat*(1-5) while in DB2 from 1 to 5

PaigeMiller
Diamond | Level 26

Here's the problem. In order to write code, you have to know the inputs, which you have provided; and you have to know the outputs from those inputs (which you have not provided because your outputs have values for CAT = 5 and so this does not correspond to the inputs), and you have to know the logic to get from inputs to outputs. So I'm stuck, and I'm afraid I still don't understand what the method is and I still don't understand the method to get from inputs to outputs.

 

And even if you explain the CAT5 issue, its still not clear to me how to get the other categories either. Please explain in words the whole thing. Step by step examples are good. Do not make me guess what to do. Do not make me guess what the steps are.

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
Ok, I understand the point. I will edit the post
PaigeMiller
Diamond | Level 26

@NewUsrStat wrote:
Ok, I understand the point. I will edit the post

Please do not modify the original post. Please explain in a reply. Otherwise, if you modify the original post, then the thread makes no sense to anyone.

 

Or start a brand new thread.

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
I just added an explanation. Hope it is more clear
PaigeMiller
Diamond | Level 26

Okay, I think I understand now. Thanks.

 

This is a brute force solution, but it seems to work.

 

%macro do_this;
    proc delete data=db2; run; quit;
    %do i=1 %to 5;
        proc summary data=db1 nway;
            class place;
            var value;
            weight cat&i;
            output out=out sum=;
        run;
        data out;
            set out;
            cat=&i;
        run;
        proc append base=db2 new=out;
        run;
        proc delete data=out; run; quit;
    %end;
%mend;
%do_this
--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
Really thank you very much!!!!! It works perfectly!
NewUsrStat
Lapis Lazuli | Level 10
Sorry but why "0" disappears in the output. I need
Tom
Super User Tom
Super User

Sounds like CAT1 to CAT5 are BOOLEAN flags.  Perhaps meaning that Category 1 (or 2 or ...) was present.

Sounds like the goal is to find the maximum category that has present?

data have;
  set want;
  array cats cat1-cat5;
  cat=0;
  do index=1 to dim(cats);
     if cats[index] then cat=index;
  end;
  drop index;
run;
NewUsrStat
Lapis Lazuli | Level 10
Not exactly Tom. Just to sum values by place (1-3) for each category with the exception of all "0" in all categories in DB1 as I explained in the post
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Not exactly Tom. Just to sum values by place (1-3) for each category with the exception of all "0" in all categories in DB1 as I explained in the post

To SUM a set of variable in a single observation just use the SUM() function.

data want ;
  set have;
  cat = sum(of cat1-cat5);
run;

To SUM a series of variables just use PROC SUMMARY (aka PROC MEANS).

proc summary data=have;
   output out=want sum(cat1-cat5)= ;
run;

If you want separate sums for each value of PLACE then you could add PLACE as a CLASS variable.

But you will still have five variables named CAT1 to CAT5 that now have the sums. 

I do not see how that would get you a single variable named CAT.  Unless you wanted to TRANSPOSE the data so that you have 5 observations.

proc summary nway data=have;
  class place;
  var cat1-cat5 ;
  output out=wide sum= ;
run;

proc transpose data=wide out=tall;
  by place;
  var cat1-cat5 ;
run;

 

Tom_0-1691503877396.png

 

 

And you have not explained what role that last variable VALUE has in this at all.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 5068 views
  • 1 like
  • 4 in conversation