Solved: sorting dataset following 4 rules - Page 2

michtka · Posted 05-25-2013 11:54 PM

My apologize Tom, I forgot the counting variable for FOOD and DRINK, I will called nsoc, and I modify your code, and now It get sense.

Btw, in your code, I dont understand 99999, and group by 1,2,3,4,5

Would you like to explain me it.

Thanks.

dataset HAVE:

nvalue nintensity nitem nsoc value intensity item flag

2 3 1 5 DRINK high orange 3

2 2 0 3 DRINK medium orange 2

. 1 2 2 DRINK low green 1

. 3 0 5 DRINK high green 3

. 2 0 3 DRINK medium green 2

5 2 3 4 FOOD low cheese 1

5 4 0 2 FOOD high cheese 3

. 1 4 3 FOOD medium meat 2

5 2 4 4 FOOD low meat 1

. 4 0 2 FOOD high meat 3

5 1 0 3 FOOD medium vegetable 2

The dataset would be:

proc sql noprint ;

create table display as

select freqvalue, freqitem, flag

, item as level

, intensity

, nitem as count

from prewant

union

select freqvalue,99999 as freqitem, flag

, upcase(value) as level

, intensity

, nsoc as count /* it makes sense now */

from prewant group by 1,2,3,4,5

order by freqvalue descending

, freqitem descending

, flag descending

;

quit;

michtka · Posted 05-26-2013 12:00 AM

OK, I understood the 99999, but the by group 1,2,3,4,5 ? Thanks.

michtka · Posted 05-26-2013 12:18 AM

Finally , I ridd off the 99999, by max(nsoc) as freqitem in (FAMILIES category), and the numbers 1,2,3,4,5...still dont know why you ise that.

You code is brilliant.

Thank you very much for all your help.

V

proc sql noprint ;

create table display as

select freqvalue, freqitem, flag

, item as level

, intensity

, nitem as count

from prewant

union

select freqvalue, max(nsoc) as freqitem, flag

, upcase(value) as level

, intensity

, nsoc as count /* This is the value that makes no sense */

from prewant

order by freqvalue descending

, freqitem descending

, flag descending

;

quit;

Tom · Posted 05-26-2013 12:45 AM

The 99999 is just an arbitrary large number so that the sorting variable works. That is SQL syntax for assigning a constant value to a variable. Choose a value that is larger than any value of FREQITEM to make sure that the rows for FOOD and DRINK sort out before the detailed rows for the items. You could also add another sorting variable instead to would allow you to sort the summary rows before the detail rows.

The GROUP BY is to eliminate the duplicates caused by your data structure. You have the same combination of VALUE*INTENSITY listed many times because you have merge VALUE and FREQVALUE onto every distinct value of ITEM, but and you only want to include them once in the final table. The 1,2,3,4,5 is just SQL shorthand for the first 5 columns listed in the SELECT clause. I could have re-typed the names, but using the column number is less typing. Note that if you had a separate summary table for the VALUE level then you would not need the group by to eliminate the duplicate rows.

You might be able to get the same thing by adding the DISTINCT keyword after the SELECT instead of using the GROUP BY clause. For that to work the NSOC value that you want to assign to the new COUNT variable must be a constant over all rows with the same combination of the other variables.

select distinct

freqvalue,99999 as freqitem, flag

, upcase(value) as level

, intensity

, nsoc as count /* it makes sense now */

from prewant

michtka · Posted 05-26-2013 10:12 AM

Hi Tom, if The families have the same Freqvlaue, for example FOOD, DRINK =5, the code it doesn.t work right, and the level variable shows something like:

level

5FOOD

5 FOOD

5DRINK

They put two categories together.

Tom · Posted 05-26-2013 05:50 PM

You need to keep another variable to break the ties, such as the original text variables, along with the frequency variables so that you can include them in the sort order. So if you have FOOD=5 and DRINK=5 then FOOD will come after DRINK because F comes after D. Similarly for the ties between MEAT and VEGETABLES.

order by freqvalue descending

, value

, freqitem descending

, item

, flag descending

Tom · Posted 05-25-2013 01:59 PM

You seem to have two operations here. One is setting order for the text in your variables. The other appears to be some type of summary calculation. Not sure what you are summarizing, but it appears to be consistent with take the MAX() of Nintensity variable.

You can use PROC SUMMARY to summarize by FOOD/DRINK and by individual items.

proc summary data=have ;

class value item intensity ;

types value*intensity value*item*intensity;

var nintensity;

output out=summary max= ;

run;

You can create INFORMATs to convert the text strings to integers so that you can order them.

proc format ;

invalue value (upcase) 'FOOD'=1 'DRINK'=2 ;

invalue intensity (upcase) 'HIGH'=1 'MEDIUM'=2 'LOW'=3 ;

invalue item (upcase)

'FOOD','DRINK'=0

'MEAT'=1 'CHEESE'=2 'VEGETABLE'=3

'GREEN'=1 'ORANGE'=2

;

run;

You then use a data step to collapse the FOOD/DRINK and ITEM into a single variable for display purposes and use the INFORMATS to create something that can be used for ordering.

data want;

set summary;

value_order = input(value,value.);

intensity_order = input(intensity,intensity.);

if _type_ = '101'B then do;

row_order = input(value,item.);

row_label = value ;

end;

if _type_ = '111'B then do;

row_order = input(item,item.);

row_label = item ;

end;

run;

proc sort;

by value_order row_order intensity_order;

run;

proc print;

var row_label intensity nintensity ;

run;

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

Re: sorting dataset following 4 rules

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away