Solved: selecting subsets of a data set

geneshackman · Posted 09-08-2020 02:54 PM

Hi, I have a data set that looks like the set below.

There are a set of variables (v1, v2, v3). Each variable covers the same set of counties (counties a1, a2, a3). I did a kurtosis analysis and got kurtosis for each variable (third column). The variables have the same kurtosis values for each county, but each variable has different values. So for example, V1 has a kurtosis of 44 for all the counties, V2 has a kurtosis of 24 for all counties, and so on.

I don't know beforehand what the value of the kurtosis will be. Also, I will be sorting the data set by the kurtosis values, so I won't know, beforehand, the order of the variables. When I run the data, v1 could have the largest kurtosis, or v3 could have the largest kurtosis."

I want to select the variable and all it's counties, with the largest kurtosis, save it as a subset, then select the variable and all it's counties, with the second largest kurtosis, save that as a subset, and so on.

My initial idea was to make another column (VarOrder) that would assign 1, 2, 3, to show which variable was largest, second largest, third largest. I'm not sure how to do that, and would also welcome suggestions on any other way to do this.

variable	county	Kurtosis	v_other	VarOrder
v1	a1	44	3
v1	a2	44	4
v1	a3	44	2
v2	a1	24	5
v2	a2	24	2
v2	a3	24	4
v3	a1	36	1
v3	a2	36	5
v3	a3	36	2

Thanks

Gene

RichardDeVen · Posted 09-08-2020 03:41 PM

You really don't need to separate the original data set into subsets. Instead, compute the kurtosis and sort the data by descending kurtosis and variable. After that work, assign a 'sequence' number to the sorted rows and then use WHERE and BY statements to restrict and independently process groups of rows that you were planning to place in different data sets.

Example:

You don't show how the original kurtosis analysis is done.

proc sort data=kurtoses;
  by descending kurtosis variable county;
run;

data kurtoses;
  set kurtoses;
  by descending kurtosis variable county;
  if first.variable then kurtorder+1;       * group kurtorder=1 means variable has highest kurtosis;
run;

proc print data=kurtoses;
  by kurtorder;
  where kurtorder <= 3;
  var kurtosis variable county;
run;

View solution in original post

PaigeMiller · Posted 09-08-2020 03:14 PM

All of this was discussed in your other thread. There, you were asked to provide an example of the output you want. Please provide an example of the output. If you want to move forward on this, we need to see the desired output.

Also, in your other thread we explained how there's no need to separate the data for the largest value, do an analysis; then pick the second largest value, separate that out, and do the same analysis. In your other thread, you said PROC SUMMARY and then sort won't work, but using you have explained, it will work, and it will be a lot simpler than the method you are proposing.

--
Paige Miller

RichardDeVen · Posted 09-08-2020 03:41 PM

You really don't need to separate the original data set into subsets. Instead, compute the kurtosis and sort the data by descending kurtosis and variable. After that work, assign a 'sequence' number to the sorted rows and then use WHERE and BY statements to restrict and independently process groups of rows that you were planning to place in different data sets.

Example:

You don't show how the original kurtosis analysis is done.

proc sort data=kurtoses;
  by descending kurtosis variable county;
run;

data kurtoses;
  set kurtoses;
  by descending kurtosis variable county;
  if first.variable then kurtorder+1;       * group kurtorder=1 means variable has highest kurtosis;
run;

proc print data=kurtoses;
  by kurtorder;
  where kurtorder <= 3;
  var kurtosis variable county;
run;

geneshackman · Posted 09-08-2020 04:06 PM

Richard,

Thanks very much for your suggestion. That was exactly what I was looking for.

Thanks

Gene

selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

Re: selecting subsets of a data set

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!