Hi! I need help combining multiple values of a variable into 1 and averaging out the indexes associated with each of them. For example, imagine I have four fruit baskets and in each one I have associated index values of apples, oranges, and cherries.
FruitBasket AppleIndex OrangeIndex CherryIndex
1 3 5 7
2 4 4 4
3 2 4 6
4 5 8 5
Now, I would like to rename the FruitBasket values 'A', 'B', 'C', etc. Here's what I need help with. Say that I'm happy with FruitBasket=1 and I'll simply rename it A [If FruitBasket=1 then FruitBasket=A]. But let's say I want to combine FruitBasket=1 and FruitBasket=2 and make it FruitBasket=B. I need the indexes averaged out and reassigned to B. So then my data would look like this (4th basket is alone and becomes C):
FruitBasket AppleIndex OrangeIndex CherryIndex
A 3 5 7
B 3 4 5
C 5 8 5
Thank you in advance!
Diana
Okay, go back to what @ballardw said in message 4 of this thread. You can turn these into formats and then run PROC SUMMARY to get the means. Formats will cause PROC SUMMARY to compute means of all the ONET values in a given IPUMS.
So, for example, working on the first two lines of your Excel file.
proc format;
value ipumsf '11-1021.00'='111021' '11-1011.00','11-1011.03','11-1031.00' = '110XX';
run;
proc summary data=have;
class ipums_occsoc;
format ipums_occsoc ipumsf.;
var cognitiveindex ...;
output out=means mean=;
run;
There are automated ways to have the format created from the data read in from Excel, or you could just type it in. If you need to do one of the automated ways, just ask.
What rule do you use to identify which collections of rows are to be replaced by a new row of averages? Is there a second table specify which rows are to be averaged?
@Diana_AdventuresinSAS wrote:
Hi! I need help combining multiple values of a variable into 1 and averaging out the indexes associated with each of them. For example, imagine I have four fruit baskets and in each one I have associated index values of apples, oranges, and cherries.
FruitBasket AppleIndex OrangeIndex CherryIndex
1 3 5 7
2 4 4 4
3 2 4 6
4 5 8 5
Now, I would like to rename the FruitBasket values 'A', 'B', 'C', etc. Here's what I need help with. Say that I'm happy with FruitBasket=1 and I'll simply rename it A [If FruitBasket=1 then FruitBasket=A]. But let's say I want to combine FruitBasket=1 and FruitBasket=2 and make it FruitBasket=B. I need the indexes averaged out and reassigned to B. So then my data would look like this (4th basket is alone and becomes C):
FruitBasket AppleIndex OrangeIndex CherryIndex
A 3 5 7
B 3 4 5
C 5 8 5
Thank you in advance!
Diana
In addition to the question asked by @mkeintz, it seems to me you have a typo (or I don't understand your question).
Do you mean to say "let's say I want to combine FruitBasket=2 and FruitBasket=3 and make it FruitBasket=B"?? If that's what you mean (it's not what you wrote), then I can see how you get the results shown. If that's not what you mean, then none of this makes sense.
Also, you can't change Fruitbasket=1 to Fruitbasket='A' because you cannot change a numeric variable to a character variables. You could however create a new variable called (for example) NewFruitBasket='A'.
Yes, it is a typo. I do mean 2 and 3. Sorry about that. (I'm very sleep deprived). And regarding the char and num, I forgot the rule, and for the purpose of my actual project, it's all numerical so that won't be an issue.
Regarding the rules, I'm not sure how to answer that other than provide you with an Excel sheet of the values on one column that will end up being the new names ("baskets") and on the other column the list that will be combined. I need to merge two sets of occupation data (ONET that has some detailed occupations that will be grouped together such as many types of doctors under one code "Physician" and the other set IPUMS, the destination data, where I will insert ONET variables into). In IPUMS, the occupation codes are sometimes matching the ONET and sometimes they are broad and require that ONET occupations be grouped to match it. The other issue is that in ONET, the occupation is coded for example 11-1020.00 and in IPUMS it's 111020. So I need to changed the ONET values to reflect those in IPUMS, as well as average out the indexes for all the occupations that I'm grouping.
SAS Output ONET data
Obs | OCCSOC | cognitiveindex | manuphysindex | workvaluesindex | workstylesindex | technicalindex | manageindex | environindex | jobhazardindex | bodypositindex |
---|---|---|---|---|---|---|---|---|---|---|
1 | 11-1011.00 | 73.74 | 17.75 | 24.66 | 13.26 | 16.99 | 16.25 | 12.21 | 9.56 | 12.33 |
2 | 11-1011.03 | 67.02 | 18.35 | 24.00 | 12.96 | 17.22 | 11.76 | 9.26 | 7.25 | 10.55 |
3 | 11-1021.00 | 64.87 | 26.25 | 23.33 | 11.67 | 19.26 | 13.62 | 11.39 | 10.56 | 13.03 |
4 | 11-1031.00 | . | . | 20.33 | . | . | . | . | . | . |
5 | 11-2011.00 | 68.02 | 19.00 | 20.99 | 12.75 | 16.00 | 12.38 | 8.85 | 6.74 | 9.98 |
6 | 11-2011.01 | . | . | 20.16 | . | . | . | . | . | . |
7 | 11-2021.00 | 67.99 | 17.49 | 23.17 | 12.03 | 16.64 | 12.38 | 9.00 | 7.09 | 10.12 |
8 | 11-2022.00 | 67.62 | 19.74 | 19.50 | 12.57 | 16.24 | 13.13 | 9.86 | 7.19 | 9.79 |
9 | 11-2031.00 | 68.61 | 17.26 | 22.50 | 12.51 | 15.49 | 12.74 | 9.03 | 6.82 | 10.27 |
One way to create groups and assign different display values at one time, without changing underlying data, is use of custom formats.
data work.fruits; input FruitBasket AppleIndex OrangeIndex CherryIndex; datalines; 1 3 5 7 2 4 4 4 3 2 4 6 4 5 8 5 ; run; proc format library=work; value fb 1= 'A' 2,3 = 'B' 4 = 'C' ; run; proc summary data=work.fruits nway; class fruitbasket ; format fruitbasket fb.; var AppleIndex OrangeIndex CherryIndex; output out=work.fruitsummary (drop=_type_ _freq_) mean=; run;
Since you appear to have a very arbitrary rule about combining data and not quite matching the narrative to displayed values I picked an approach that mimics the displayed data.
Note the better way to show your data values as data step code. Then people can actually test code against values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
You seem to be hitting at what I need although I'm not savvy enough to confirm if that's exactly it. Have you seen my reply to the previous post where I include my data and an Excel attachment? Maybe that would answer your question regarding the rule. The Excel sheet shows exactly what needs to be combined, and as you can tell, it is quite arbitrary.
Many of us will not download and open Microsoft Office documents, as they are a security risk.
Can you convert the Excel file to a .txt (or .csv) file, or just show us the relevant parts?
Oh okay, no problem. That makes sense. Here's a copy/paste of the first 30 lines; it goes up to a final count of about 460 total occupations.
IPUMS OCCSOC | Recode ONET to match |
111021 | 11-1021.00 |
1110XX | 11-1011.00, 11-1011.03,11-1031.00 |
112011 | 11-2011.00, 11-2011.01 |
112020 | 11-2021.00, 11-2022.00 |
112031 | 11-2031.00 |
113011 | 11-3011.00 |
113021 | 11-3021.00 |
113031 | 11-3021.01, 11-3021.02 |
113051 | 11-3051.00, 11-3051.01, 11-3051.02, 11-3051.03, 11-3051.04, 11-3051.05, 11-3051.06 |
113061 | 11-3061.00 |
113071 | 11-3071.00, 11-3071.01, 11-3071.02, 11-3071.03 |
113111 | 11-3111.00 |
113121 | 11-3121.00 |
113131 | 11-3131.00 |
119013 | 11-9013.01, 11-9013.02, 11-9013.03 |
119021 | 11-9021.00 |
119030 | 11-9031.00, 11-9032.00, 11-9033.00, 11-9039.01, 11-9039.02 |
119041 | 11-9041.00, 11-9041.01 |
119051 | 11-9051.00 |
119071 | 11-9071.00 |
119081 | 11-9081.00 |
119111 | 11-9111.00 |
119121 | 11-9121.00, 11-9121.01, 11-9121.02 |
119141 | 11-9041.00 |
119151 | 11-9151.00 |
119161 | 11-9161.00 |
119XXX | 11-9199.00, 11-9199.01, 11-9199.02, 11-9199.03, 11-9199.04, 11-9199.07, 11-9199.08, 11-9199.09, 11-9199.10, 11-9199.11 |
131011 | 13-1011.00 |
131021 | 13-1021.00 |
Okay, go back to what @ballardw said in message 4 of this thread. You can turn these into formats and then run PROC SUMMARY to get the means. Formats will cause PROC SUMMARY to compute means of all the ONET values in a given IPUMS.
So, for example, working on the first two lines of your Excel file.
proc format;
value ipumsf '11-1021.00'='111021' '11-1011.00','11-1011.03','11-1031.00' = '110XX';
run;
proc summary data=have;
class ipums_occsoc;
format ipums_occsoc ipumsf.;
var cognitiveindex ...;
output out=means mean=;
run;
There are automated ways to have the format created from the data read in from Excel, or you could just type it in. If you need to do one of the automated ways, just ask.
@PaigeMiller thanks for your help so far. I tried your method but I'm stuck. I'm new to formats so bear with me.
Question: Does a data set need to exist for proc format to create values? It seems to me that some catalog gets created that simply says all these values will be called this? I believe your last post you were trying to have me import the Excel file from which these formats would be created.
Here's the code I tried and the log. I only want to do the 3 occupations until I can see it working. I was hoping to create a new data set with one observation (occupation 110XX and the averaged indexes). Is there any other information I can provide you with that will help?
proc format library=work;
value $occf
'11-1011.00', '11-1011.03', '11-1031.00' = '110XX';
run;
proc summary data=onetlib.onet;
class occsoc;
format occsoc occf.;
var bodypositindex cognitiveindex environindex jobhazardindex manageindex manuphysindex technicalindex workstylesindex workvaluesindex;
output out=onetlib.newonet (drop=_type_ _freq_) mean=;
run;
1
NOTE: Libref ONETLIB was successfully assigned as follows:
.
3
4 proc format library=work;
5 value $occf
6 '11-1011.00', '11-1011.03', '11-1031.00' = '110XX';
NOTE: Format $OCCF has been output.
7 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
8
9 proc summary data=onetlib.onet;
10 class occsoc;
11 format occsoc occf.;
ERROR: The format OCCF was not found or could not be loaded.
12 var bodypositindex cognitiveindex environindex jobhazardindex manageindex
13 manuphysindex technicalindex workstylesindex workvaluesindex;
14 output out=onetlib.newonet (drop=_type_ _freq_) mean=;
15 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set ONETLIB.NEWONET may be incomplete. When this step was stopped there were
0 observations and 0 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
No, a dataset doesn't have to exist for a format to be created. Your problem was that you created a character format (i.e., one that begins with a $ and applies to character values), but then tried to apply a numeric format with the same name (which you never created).
Just change:
11 format occsoc occf.;
to
format occsoc $occf.;
Art, CEO, AnalystFinder.com
Thanks @art297! Seems to be working, however, there is an empty obs=1 which I can't figure out. Any idea what that is and can I expect to see it when I do the formats for all the occupations? (If anything, I'll just delete the first observation).
SAS Output
Obs | OCCSOC | cognitiveindex | manuphysindex | workvaluesindex | workstylesindex | technicalindex | manageindex | environindex | jobhazardindex | bodypositindex |
---|---|---|---|---|---|---|---|---|---|---|
1 | 62.7240 | 30.7268 | 16.3287 | 11.2875 | 20.9301 | 9.6196 | 13.6965 | 11.1576 | 12.9805 | |
2 | 110XX | 70.3800 | 18.0500 | 22.9967 | 13.1100 | 17.1050 | 14.0050 | 10.7350 | 8.4050 | 11.4400 |
3 | 11-10 | 64.8700 | 26.2500 | 23.3300 | 11.6700 | 19.2600 | 13.6200 | 11.3900 | 10.5600 | 13.0300 |
4 | 11-20 | 68.0600 | 18.3725 | 21.2640 | 12.4650 | 16.0925 | 12.6575 | 9.1850 | 6.9600 | 10.0400 |
5 | 11-30 | 67.4521 | 23.9993 | 20.1900 | 11.7486 | 21.2250 | 13.2214 | 12.6186 | 10.2921 | 13.0050 |
6 | 11-31 | 65.2467 | 17.9100 | 22.1100 | 11.9133 | 16.1167 | 13.0767 | 7.7633 | 6.8167 | 9.9433 |
7 | 11-90 | 68.1787 | 26.5507 | 20.6980 | 11.7727 | 19.8987 | 12.9013 | 12.1907 | 9.9053 | 11.9520 |
8 | 11-91 | 67.4129 | 21.3559 | 20.1441 | 11.6224 | 17.6735 | 12.3988 | 10.9771 | 9.1276 | 11.9012 |
9 | 13-10 | 65.6645 | 22.2930 | 17.7835 | 11.5360 | 17.2485 | 10.5700 | 11.5190 | 8.8400 | 11.8075 |
10 | 13-11 | 65.8975 | 19.8050 | 19.2508 | 11.7808 | 18.1442 | 10.9083 | 9.6892 | 7.2650 | 10.2525 |
I haven't seen your data, or all of the specifics of this thread but, if that is the result of your proc summary, that top line may just be a total set of averages. Rerun it again with the way option and see if that gets rid of that top line of data. i.e.:
proc summary data=onetlib.onet nway; class occsoc; format occsoc occf.; var bodypositindex cognitiveindex environindex jobhazardindex manageindex manuphysindex technicalindex workstylesindex workvaluesindex; output out=onetlib.newonet (drop=_type_ _freq_) mean=; run;
Art, CEO, AnalystFinder.com
Observation 1 from PROC SUMMARY contains the global statistics (in other words, using all data, regardless of OCCSOC value).
To remove this from the output, you could use the NWAY option in the PROC SUMMARY statement.
Hello,
I managed to create the desired data set. I have one last question pertaining to a permanent save of outputted data. I thought if I write the libname in "output out=onetlib.onetmerge" in the proc summary that it would create a permanent dataset regardless of whether the formats are permanent (since in proc format I use "library=work;"). When I ask it to proc print the data without a proc format running, it states "ERROR: Format $OCCF not found or couldn't be loaded for variable OCCSOC."
I plan to merge this with other data (IPUMS) and I would like to save permanent sets, and have multiple copies in different locations.
Thanks so much!!!
Diana
Current syntax:
libname ONETlib 'C:\Users\..'
proc format library=work;
value $occf
'11-1021.00'='111021'
'11-1011.00', '11-1011.03', '11-1031.00'='1110XX'
.
.
'53-7111.00', '53-7121.00'='5371XX';
run;
proc summary data=onetlib.onet nway;
class occsoc;
format occsoc $occf.;
var cognitiveindex manuphysindex workvaluesindex workstylesindex technicalindex manageindex environindex jobhazardindex bodypositindex ;
output out=onetlib.onetmerge (drop=_type_ _freq_) mean=;
run;
proc print data=onetlib.onetmerge; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.