Hi, I have a data set that looks like the set below.
There are a set of variables (v1, v2, v3). Each variable covers the same set of counties (counties a1, a2, a3). I did a kurtosis analysis and got kurtosis for each variable (third column). The variables have the same kurtosis values for each county, but each variable has different values. So for example, V1 has a kurtosis of 44 for all the counties, V2 has a kurtosis of 24 for all counties, and so on.
I don't know beforehand what the value of the kurtosis will be. Also, I will be sorting the data set by the kurtosis values, so I won't know, beforehand, the order of the variables. When I run the data, v1 could have the largest kurtosis, or v3 could have the largest kurtosis."
I want to select the variable and all it's counties, with the largest kurtosis, save it as a subset, then select the variable and all it's counties, with the second largest kurtosis, save that as a subset, and so on.
My initial idea was to make another column (VarOrder) that would assign 1, 2, 3, to show which variable was largest, second largest, third largest. I'm not sure how to do that, and would also welcome suggestions on any other way to do this.
variable | county | Kurtosis | v_other | VarOrder |
v1 | a1 | 44 | 3 | |
v1 | a2 | 44 | 4 | |
v1 | a3 | 44 | 2 | |
v2 | a1 | 24 | 5 | |
v2 | a2 | 24 | 2 | |
v2 | a3 | 24 | 4 | |
v3 | a1 | 36 | 1 | |
v3 | a2 | 36 | 5 | |
v3 | a3 | 36 | 2 |
Thanks
Gene
You really don't need to separate the original data set into subsets. Instead, compute the kurtosis and sort the data by descending kurtosis and variable. After that work, assign a 'sequence' number to the sorted rows and then use WHERE and BY statements to restrict and independently process groups of rows that you were planning to place in different data sets.
Example:
You don't show how the original kurtosis analysis is done.
proc sort data=kurtoses; by descending kurtosis variable county; run; data kurtoses; set kurtoses; by descending kurtosis variable county; if first.variable then kurtorder+1; * group kurtorder=1 means variable has highest kurtosis; run; proc print data=kurtoses; by kurtorder; where kurtorder <= 3; var kurtosis variable county; run;
All of this was discussed in your other thread. There, you were asked to provide an example of the output you want. Please provide an example of the output. If you want to move forward on this, we need to see the desired output.
Also, in your other thread we explained how there's no need to separate the data for the largest value, do an analysis; then pick the second largest value, separate that out, and do the same analysis. In your other thread, you said PROC SUMMARY and then sort won't work, but using you have explained, it will work, and it will be a lot simpler than the method you are proposing.
You really don't need to separate the original data set into subsets. Instead, compute the kurtosis and sort the data by descending kurtosis and variable. After that work, assign a 'sequence' number to the sorted rows and then use WHERE and BY statements to restrict and independently process groups of rows that you were planning to place in different data sets.
Example:
You don't show how the original kurtosis analysis is done.
proc sort data=kurtoses; by descending kurtosis variable county; run; data kurtoses; set kurtoses; by descending kurtosis variable county; if first.variable then kurtorder+1; * group kurtorder=1 means variable has highest kurtosis; run; proc print data=kurtoses; by kurtorder; where kurtorder <= 3; var kurtosis variable county; run;
Richard,
Thanks very much for your suggestion. That was exactly what I was looking for.
Thanks
Gene
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.