BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michtka
Fluorite | Level 6

Hi everyone, I hope you can help me with this. Thanks in advance.V

I have this dataset HAVE:

data HAVE;

length nvalue nintensity nitem 8.;

length value intensity item $20.;

input  value $ nvalue  intensity $  nintensity  item $  nitem;

datalines;

DRINK 2 high   3 orange 1 

DRINK 2 medium 2 orange .

DRINK .  low    1 green  2

DRINK .  high   3 green  .

DRINK .  medium 2 green  .

FOOD  5 low    2 cheese 3

FOOD  5 high   4 cheese .

FOOD  .  medium 1 meat   4

FOOD  5 low    2 meat   4

FOOD  .  high   4 meat   .

FOOD  5 medium 1 vegetable .

;

run;

And I want to obtain the WANT dataset: As you can see the order  are involved 4 rules:

, FOOD and DRINK are families > subfamilies.

  FOOD  (5)> DRINK(2)

  FOOD:  meat(4)>cheese(3)>Vegetanle (0)

  DRINK:green(2)>orange(1)

  High>medium>low

dataset WANT:

FOOD      high       4 

FOOD      medium  1 

FOOD      low        2 

meat        high         4   

meat        medium   1   

meat        low          2   

cheese    high        4   

cheese    low         2   

vegetable Medium  1   

DRINK     high         3  

DRINK     medium  2 

DRINK     low        1 

green     high        3   

green     medium  2   

green     low         1   

orange   high        3   

orange   medium   2 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So use the PROC SQL as I explained to create both the detailed rows and the summary rows and UNION them into one table. Make sure to include proper values for the sorting variables so that the order is what you want.  Because you have multiple observations per VALUE/INTENSITY combination you will need to so some sort of grouping/summary for those rows to prevent duplicates.  Since you are sorting by descending FREQITEM use a large number for that variable on the records generated at the VALUE (family) level.

data prewant;

input freqvalue value $ flag item $ intensity $ nitem freqitem  ;

cards;

5 FOOD 3 meat high 0 4

5 FOOD 2 meat medium 4 4

5 FOOD 1 meat low 4 4

5 FOOD 3 cheese high 0 3

5 FOOD 1 cheese low 3 3

5 FOOD 2 vegetable medium 0 0

2 DRINK 3 green high 0 2

2 DRINK 2 green medium 0 2

2 DRINK 1 green low 2 2

2 DRINK 3 orange high 1 1

2 DRINK 2 orange medium 0 1

run;


proc sql noprint ;

  create table display as

     select freqvalue, freqitem, flag

          , item as level

          , intensity

          , nitem as count

     from prewant

  union

     select freqvalue,99999 as freqitem, flag

          , upcase(value) as level

          , intensity

          , max(nitem) as nitem   /* This is the value that makes no sense */

     from prewant group by 1,2,3,4,5


  order by freqvalue descending

         , freqitem descending

         , flag descending

;

quit;


Obs   freqvalue    freqitem    flag    level       intensity    count

1        5          99999       3     FOOD         high          0

2        5          99999       2     FOOD         medium        4

3        5          99999       1     FOOD         low           4

4        5              4       3     meat         high          0

5        5              4       2     meat         medium        4

6        5              4       1     meat         low           4

7        5              3       3     cheese       high          0

8        5              3       1     cheese       low           3

9        5              0       2     vegetabl     medium        0

10       2          99999       3     DRINK        high          1

11       2          99999       2     DRINK        medium        0

12       2          99999       1     DRINK        low           2

13       2              2       3     green        high          0

14       2              2       2     green        medium        0

15       2              2       1     green        low           2

16       2              1       3     orange       high          1

17       2              1       2     orange       medium        0

View solution in original post

21 REPLIES 21
art297
Opal | Level 21

: If one change can be made to one of your data records (namely removing the nvalue from

FOOD  5 low    2 meat   4 ), you could use something like the following to rearrange your data:

data HAVE;

length value intensity item $20.;

length nvalue nintensity nitem 8.;

input  value $ nvalue  intensity $  nintensity  item $  nitem;

datalines;

DRINK 2 high   3 orange 1

DRINK 2 medium 2 orange .

DRINK .  low    1 green  2

DRINK .  high   3 green  .

DRINK .  medium 2 green  .

FOOD  5 low    2 cheese 3

FOOD  5 high   4 cheese .

FOOD  .  medium 1 meat   4

FOOD  . low    2 meat   4

FOOD  .  high   4 meat   .

FOOD  5 medium 1 vegetable .

;

run;

proc format;

  value $intensity

  "high"=1

  "medium"=2

  "low"=3

  ;

run;

data want (keep=value intensity nintensity order);

  set have;

  if missing(nvalue) then do;

    if value="DRINK" then do;

      order=12+put(intensity,$intensity.);

      output;

      value=item;

      order=15+put(intensity,$intensity.);

      output;

    end;

    else do;

      order=put(intensity,$intensity.);

      output;

      value=item;

      order=3+put(intensity,$intensity.);

      output;

    end;

  end;

  else do;

    value=item;

    select(item);

       when ("cheese") order=6+put(intensity,$intensity.);

       when ("vegetable") order=9+put(intensity,$intensity.);

       when ("orange") order=18+put(intensity,$intensity.);

       otherwise;

    end;

    output;

  end;

run;

proc sort data=want;

  by order;

run;

data want;

  set want (drop=order);

run;

michtka
Fluorite | Level 6

Thanks Arthur for this, you right the 4 value

is the right one.

Any possibility to do it using PROC SQL

using subqueries?

I am thinking about it too.

It would be great.

Thanks.

michtka
Fluorite | Level 6

Hi guys, thank you very much for your contribution, but rather than specify the categories, I would like to do something like this:

because in my real problem I have 50 Families, and every individual family have 50 subfamilies...

Then, I am trying to do something like MAX() function as Tom Suggest:

proc sql noprint;

create table fillvalitem as

select max(nvalue) as freqvalue, max(nitem) as freqitem, value, flag, intensity, nintensity,item,nitem

from have

group by item

order by freqvalue descending, freqitem descending,flag descending;

run;

But still is not right, for example DRINK is missing the value 2, and is not sorting out in the descending order, should be 2, and 1,

I think the idea is to make a first query of VALUE (group by value), and subquery of ITEM (group by value,item), but still need

to think How I can do it. If you can help me with this: Thanks.

          5            4       FOOD       3      high             4        meat           .

         5            4       FOOD       2      medium           1        meat           4

         5            4       FOOD       1      low              2        meat           4

         5            3       FOOD       3      high             4        cheese         .

         5            3       FOOD       1      low              2        cheese         3

         5            .       FOOD       2      medium           1        vegetable      .

         2            1       DRINK      3      high             3        orange         1

         2            1       DRINK      2      medium           2        orange         .

         .            2       DRINK      3      high             3        green          .

         .            2       DRINK      2      medium           2        green          .

         .            2       DRINK      1      low              1        green          2

michtka
Fluorite | Level 6

sorry, I missed the dataset with the flag variable, and the BY group (value,item) rather than item, to generate the above dataset:

data have;

  set have;

  if intensity="high" then flag=3;

  else if intensity="medium" then flag=2;

  else flag=1;

run;

proc sql noprint;

create table fillvalitem as

select max(nvalue) as freqvalue, max(nitem) as freqitem, value, flag, intensity, nintensity,item,nitem

from have

group by value, item

order by freqvalue descending, freqitem descending,flag descending;

run;

Tom
Super User Tom
Super User

Since the output you displayed is two different summaries you could do it in SQL as two queries with a UNION.

In terms of the ordering you will need variables that you can use to sort. You could use FORMATs or CASE statements or have separate code list tables that you link in.

proc sql ;

  create table want as

  select .... from have group by value, intensity

  UNION

  select .... from have group by value, item, intensity

  ;

quit;

michtka
Fluorite | Level 6

Thanks Tom,

I still thinking  need to be a query and a  subquery, because two queries  that you mention will add  more records that I really need I think.

michtka
Fluorite | Level 6

at least I found the first step (first sorting) of my dataset WANT...I used three PROC SQL, still thinking that is possible to do it with only one PROC SQL, but

i dounno how...the thing is, that this works better...I need to think to get last final dataset WANT.

proc sql noprint;

create table fillvalue as

select max(nvalue) as freqvalue,value,flag,item,intensity

from have

group by value

order by freqvalue descending,flag descending;

quit;

proc sql noprint;

create table fillitem as

select max(nitem) as freqitem,value,flag,item,intensity

from have

group by value,item

order by freqitem descending,value,flag descending;

quit;

proc sql noprint;

create table prewant as

select a.*,b.freqitem

from fillvalue a left join fillitem b

on a.value=b.value and a.item=b.item and a.flag=b.flag

order by freqvalue descending,freqitem descending,flag descending;

quit;

PREWANT dataset:

  freqvalue    value    flag    item         intensity    freqitem

                     5        FOOD       3     meat          high            4

                     5        FOOD       2     meat          medium          4

                     5        FOOD       1     meat          low             4

                     5        FOOD       3     cheese        high            3

                     5        FOOD       1     cheese        low             3

                     5        FOOD       2     vegetable     medium          0

                     2        DRINK      3     green         high            2

                     2        DRINK      2     green         medium          2

                     2        DRINK      1     green         low             2

                     2        DRINK      3     orange        high            1

                     2        DRINK      2     orange        medium          1

Finallly looking for this dataset WANT:

FOOD      high       4

FOOD      medium  1

FOOD      low        2

meat        high         4  

meat        medium   1  

meat        low          2  

cheese    high        4  

cheese    low         2  

vegetable Medium  1  

DRINK     high         3 

DRINK     medium  2

DRINK     low        1

green     high        3  

green     medium  2  

green     low         1  

orange   high        3  

orange   medium   2

Tom
Super User Tom
Super User

It would help a lot if you could explain what you are summarizing.  I really have no idea where you are getting expected numbers from.


Plus I am not sure what the input records represent.  The variable VALUE seems to the food/drink category. The variable ITEM seems to be type of food or drink.  Intensity seems to be a ranking of something from low to high.  But the numbers do not make sense.  Perhaps NITEM is a a summary of the counts for that item already?  Perhaps NVALUE is counts for that level of the VALUE?  But then why are there missing values of NVALUE?

Why would the second row in your expected output have a 2?  There were no foods with a value of NVALUE or NITEM in the input data with a value of 2.

michtka
Fluorite | Level 6

Hi Tom, my apologize if I am not been clear here.

Nitem, Nvalue is just a proper  summary of number of subjects. The missing values, is because no subjects.

This 2 you mean from PREWANT dataset?....5        FOOD      2     meat          medium          4

it is a flag variable...corresponding to the medium intensity.

I  get that from manipulating my original dataset have, before use it to PROC SQL:


data have;

  set have;

  if intensity="high" then flag=3;

  else if intensity="medium" then flag=2;

  else flag=1;

run;

This dataset finally go to the PROC SQL, I did that because with : descending frequency by this flag variable, I obtain  the descending frequency in that order (High>Medium>Low)

The task of this problem is the SORTING following 4 rules:

I am not interesting to summarize anything, only to sorting out family (FOOD/DRINKS) and subfamily (items corresponding to every family) by descending frequency,

and finally sorting out by (high>medium>low).

I hope it is more clear now.

Tom
Super User Tom
Super User

I think I know what you want to display, but I cannot see how the numbers you are providing map to the numbers you want. It sounds like you have ITEM (meat/cheese/green/...) nested within VALUE (FOOD/DRINK) and you want to display overall counts before you display the individual item counts.  You have not given the variables in your output names so lets just call them ROW_LABEL, INTENSITY and COUNT.  ROW_LABEL is either the value of VALUE or the value of ITEM depending on whether the row is for the overall summary or for the detailed summary.

It sounds like you have figured out how to sort the intensity levels by adding the variable FLAG.  You can do something similar to sort the items.  If you really only have two higher level categories of DRINK and FOOD then you can sort by that variable and use descending option to reverse it from the alphabetical order if you want.

But I still have no idea how you get the three numbers for the COUNT column in the overall summary rows? (FOOD -> 4,1,2 ; DRINK -> 3,2,1)

Where did the COUNT of 4 for HIGH level of FOOD come from in the source data?  Is it just copied from one of the rows? Calculated? If so how?  Similarly for the other 5 numbers.

Similarly how did you get the numbers for the detailed rows?  Are they just copied forward?  Summarized?

It would probably be easier if you displayed the raw values.  It sounds like your raw values would look something like:

SUBJECT VALUE ITEM INTENSITY

1 FOOD meat high

1 DRINK green low

2 FOOD vegetable medium

....

Tom
Super User Tom
Super User

Let's go back to your "RULES" and use the numbers in there to make some data that make sense?

, FOOD and DRINK are families > subfamilies.

  FOOD  (5)> DRINK(2)

  FOOD:  meat(4)>cheese(3)>Vegetanle (0)

  DRINK:green(2)>orange(1)

  High>medium>low

To put FOOD > DRINK we need to know the frequency (5) is larger than (2).

So you need a variable FAMILY with values of FOOD and DRINK and ANOTHER variable FAMILY_FREQ with values of 5 and 2, on every observation.

FAMILY FAMILY_FREQ

FOOD 5

DRINK 2

Now for the SUBFAMILY level since there is no overlap for drinks and foods it really doesn't matter if we calculate them within FAMILY or not, but it makes more sense for later to do it within the FAMILY.

FAMILY SUBFAMILY SUBFAMILY_FREQ

FOOD MEAT 4

FOOD CHEESE 3

FOOD VEGETABLE 0

DRINK GREEN 2

DRINK ORANGE 1

Combine these on value of FAMILY and you will get:

FAMILY FAMILY_FREQ SUBFAMILY SUBFAMILY_FREQ

FOOD 5 MEAT 4

FOOD 5 CHEESE 3

FOOD 5 VEGETABLE 0

DRINK 2 GREEN 2

DRINK 2 ORANGE 1

Now you can combine with your real data and use the FAMILY_FREQ and SUBFAMILY_FREQ (and your FLAG variable) to order the rows.

Does that help?

I still do not see how your are getting the NUMBERS your want the display. 

michtka
Fluorite | Level 6

Hi Tom, yes, forget about if the values of nitems are right or wrong, doesn't mattter (it become of a summary process that I just solved from the RAW of data, and I just got this numbers as the final count.)

then.. my final dataset WANT will be something like:

level        intensity   nitem

FOOD      high        x

FOOD      medium   x

FOOD      low         x

meat        high        x   

meat        medium   x  

meat        low         x   

cheese    high         x 

cheese    low          x 

vegetable Medium   x 

DRINK     high        x 

DRINK     medium  x

DRINK     low         x

green     high         x

green     medium    x

green     low          x 

orange   high         x 

orange   medium   x

Then,

STEP 1) Remember my dataset PREWANT:

            freqvalue    value    flag    item         intensity    nitem    freqitem

                5        FOOD       3     meat          high          0          4

                5        FOOD       2     meat          medium        4          4

                5        FOOD       1     meat          low           4          4

                5        FOOD       3     cheese        high          0          3

                5        FOOD       1     cheese        low           3          3

                5        FOOD       2     vegetable     medium        0          0

                2        DRINK      3     green         high          0          2

                2        DRINK      2     green         medium        0          2

                2        DRINK      1     green         low           2          2

                2        DRINK      3     orange        high          1          1

                2        DRINK      2     orange        medium        0          1

Doing some transformation:

data base_want (keep=level);

set prewant;

by descending freqvalue descending freqitem;

if first.freqvalue then do;

level=upcase(VALUE);

count=nitem;         

output;

end;

if first.freqitem then do;

level=item;

count=nitem;

output;

end;

run;

STEP 2) I get this one, called BASE_WANT:

                                            level

                                           FOOD

                                            meat

                                            cheese

                                            vegetable

                                            DRINK

                                            green

                                            orange

Mi idea is,  combining this dataset 2) BASE_WANT with the previous datase t1 )  PREWANT to get the finalk dataset 3)  WANT.

I think is is more clear now, btw  (Thank you for your big interest)....I hope you can help me with this last step 3,  FROM STEP 1 AND 2 , I still dont get it.

michtka
Fluorite | Level 6

OK Tom, tomorrow I will define better my problem.

Thank you for your time.

Tom
Super User Tom
Super User

So use the PROC SQL as I explained to create both the detailed rows and the summary rows and UNION them into one table. Make sure to include proper values for the sorting variables so that the order is what you want.  Because you have multiple observations per VALUE/INTENSITY combination you will need to so some sort of grouping/summary for those rows to prevent duplicates.  Since you are sorting by descending FREQITEM use a large number for that variable on the records generated at the VALUE (family) level.

data prewant;

input freqvalue value $ flag item $ intensity $ nitem freqitem  ;

cards;

5 FOOD 3 meat high 0 4

5 FOOD 2 meat medium 4 4

5 FOOD 1 meat low 4 4

5 FOOD 3 cheese high 0 3

5 FOOD 1 cheese low 3 3

5 FOOD 2 vegetable medium 0 0

2 DRINK 3 green high 0 2

2 DRINK 2 green medium 0 2

2 DRINK 1 green low 2 2

2 DRINK 3 orange high 1 1

2 DRINK 2 orange medium 0 1

run;


proc sql noprint ;

  create table display as

     select freqvalue, freqitem, flag

          , item as level

          , intensity

          , nitem as count

     from prewant

  union

     select freqvalue,99999 as freqitem, flag

          , upcase(value) as level

          , intensity

          , max(nitem) as nitem   /* This is the value that makes no sense */

     from prewant group by 1,2,3,4,5


  order by freqvalue descending

         , freqitem descending

         , flag descending

;

quit;


Obs   freqvalue    freqitem    flag    level       intensity    count

1        5          99999       3     FOOD         high          0

2        5          99999       2     FOOD         medium        4

3        5          99999       1     FOOD         low           4

4        5              4       3     meat         high          0

5        5              4       2     meat         medium        4

6        5              4       1     meat         low           4

7        5              3       3     cheese       high          0

8        5              3       1     cheese       low           3

9        5              0       2     vegetabl     medium        0

10       2          99999       3     DRINK        high          1

11       2          99999       2     DRINK        medium        0

12       2          99999       1     DRINK        low           2

13       2              2       3     green        high          0

14       2              2       2     green        medium        0

15       2              2       1     green        low           2

16       2              1       3     orange       high          1

17       2              1       2     orange       medium        0

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1360 views
  • 6 likes
  • 3 in conversation