BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi! I need help combining multiple values of a variable into 1 and averaging out the indexes associated with each of them. For example, imagine I have four fruit baskets and in each one I have associated index values of apples, oranges, and cherries.

 

FruitBasket    AppleIndex   OrangeIndex    CherryIndex

1                          3                     5                      7

2                          4                     4                      4

3                          2                     4                      6

4                          5                     8                      5

 

 

Now, I would like to rename the FruitBasket values 'A', 'B', 'C', etc. Here's what I need help with. Say that I'm happy with FruitBasket=1 and I'll simply rename it A [If FruitBasket=1 then FruitBasket=A]. But let's say I want to combine FruitBasket=1 and FruitBasket=2 and make it FruitBasket=B. I need the indexes averaged out and reassigned to B. So then my data would look like this (4th basket is alone and becomes C):

 

FruitBasket    AppleIndex   OrangeIndex    CherryIndex

A                          3                     5                      7

B                          3                     4                      5

C                          5                     8                      5

 

Thank you in advance!

Diana

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Okay, go back to what @ballardw said in message 4 of this thread. You can turn these into formats and then run PROC SUMMARY to get the means. Formats will cause PROC SUMMARY to compute means of all the ONET values in a given IPUMS.

 

So, for example, working on the first two lines of your Excel file.

 

proc format;
    value ipumsf '11-1021.00'='111021' '11-1011.00','11-1011.03','11-1031.00' = '110XX';
run;

proc summary data=have;
    class ipums_occsoc;
    format ipums_occsoc ipumsf.;
    var cognitiveindex ...;
    output out=means mean=;
run;

There are automated ways to have the format created from the data read in from Excel, or you could just type it in. If you need to do one of the automated ways, just ask.

 

--
Paige Miller

View solution in original post

14 REPLIES 14
mkeintz
PROC Star

What rule do you use to identify which collections of rows are to be replaced by a new row of averages?  Is there a second table specify which rows are to be averaged?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

@Diana_AdventuresinSAS wrote:

Hi! I need help combining multiple values of a variable into 1 and averaging out the indexes associated with each of them. For example, imagine I have four fruit baskets and in each one I have associated index values of apples, oranges, and cherries.

 

FruitBasket    AppleIndex   OrangeIndex    CherryIndex

1                          3                     5                      7

2                          4                     4                      4

3                          2                     4                      6

4                          5                     8                      5

 

 

Now, I would like to rename the FruitBasket values 'A', 'B', 'C', etc. Here's what I need help with. Say that I'm happy with FruitBasket=1 and I'll simply rename it A [If FruitBasket=1 then FruitBasket=A]. But let's say I want to combine FruitBasket=1 and FruitBasket=2 and make it FruitBasket=B. I need the indexes averaged out and reassigned to B. So then my data would look like this (4th basket is alone and becomes C):

 

FruitBasket    AppleIndex   OrangeIndex    CherryIndex

A                          3                     5                      7

B                          3                     4                      5

C                          5                     8                      5

 

Thank you in advance!

Diana


In addition to the question asked by @mkeintz, it seems to me you have a typo (or I don't understand your question).

 

Do you mean to say "let's say I want to combine FruitBasket=2 and FruitBasket=3 and make it FruitBasket=B"?? If that's what you mean (it's not what you wrote), then I can see how you get the results shown. If that's not what you mean, then none of this makes sense.

 

Also, you can't change Fruitbasket=1 to Fruitbasket='A' because you cannot change a numeric variable to a character variables. You could however create a new variable called (for example) NewFruitBasket='A'.

--
Paige Miller
Diana_AdventuresinSAS
Obsidian | Level 7

Yes, it is a typo. I do mean 2 and 3. Sorry about that. (I'm very sleep deprived). And regarding the char and num, I forgot the rule, and for the purpose of my actual project, it's all numerical so that won't be an issue. 

 

Regarding the rules, I'm not sure how to answer that other than provide you with an Excel sheet of the values on one column that will end up being the new names ("baskets") and on the other column the list that will be combined. I need to merge two sets of occupation data (ONET that has some detailed occupations that will be grouped together such as many types of doctors under one code "Physician" and the other set IPUMS, the destination data, where I will insert ONET variables into). In IPUMS, the occupation codes are sometimes matching the ONET and sometimes they are broad and require that ONET occupations be grouped to match it. The other issue is that in ONET, the occupation is coded for example 11-1020.00 and in IPUMS it's 111020. So I need to changed the ONET values to reflect those in IPUMS, as well as average out the indexes for all the occupations that I'm grouping. 

 

 

SAS Output ONET data

 

Obs OCCSOC cognitiveindex manuphysindex workvaluesindex workstylesindex technicalindex manageindex environindex jobhazardindex bodypositindex
1 11-1011.00 73.74 17.75 24.66 13.26 16.99 16.25 12.21 9.56 12.33
2 11-1011.03 67.02 18.35 24.00 12.96 17.22 11.76 9.26 7.25 10.55
3 11-1021.00 64.87 26.25 23.33 11.67 19.26 13.62 11.39 10.56 13.03
4 11-1031.00 . . 20.33 . . . . . .
5 11-2011.00 68.02 19.00 20.99 12.75 16.00 12.38 8.85 6.74 9.98
6 11-2011.01 . . 20.16 . . . . . .
7 11-2021.00 67.99 17.49 23.17 12.03 16.64 12.38 9.00 7.09 10.12
8 11-2022.00 67.62 19.74 19.50 12.57 16.24 13.13 9.86 7.19 9.79
9 11-2031.00 68.61 17.26 22.50 12.51 15.49 12.74 9.03 6.82 10.27
ballardw
Super User

One way to create groups and assign different display values at one time, without changing underlying data, is use of custom formats.

 

data work.fruits;
 input FruitBasket    AppleIndex   OrangeIndex    CherryIndex;
datalines;
1                          3                     5                      7
2                          4                     4                      4
3                          2                     4                      6
4                          5                     8                      5
;
run;

proc format library=work;
value fb 
1= 'A'
2,3 = 'B'
4 = 'C'
;
run; 

proc summary data=work.fruits nway;
   class fruitbasket ;
   format fruitbasket fb.;
   var  AppleIndex   OrangeIndex    CherryIndex;
   output out=work.fruitsummary (drop=_type_ _freq_) mean=;
run;

Since you appear to have a very arbitrary rule about combining data and not quite matching the narrative to displayed values I picked an approach that mimics the displayed data.

 

 

Note the better way to show your data values as data step code. Then people can actually test code against values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Diana_AdventuresinSAS
Obsidian | Level 7

You seem to be hitting at what I need although I'm not savvy enough to confirm if that's exactly it. Have you seen my reply to the previous post where I include my data and an Excel attachment? Maybe that would answer your question regarding the rule. The Excel sheet shows exactly what needs to be combined, and as you can tell, it is quite arbitrary. 

PaigeMiller
Diamond | Level 26

Many of us will not download and open Microsoft Office documents, as they are a security risk.

 

Can you convert the Excel file to a .txt (or .csv) file, or just show us the relevant parts?

--
Paige Miller
Diana_AdventuresinSAS
Obsidian | Level 7

Oh okay, no problem. That makes sense. Here's a copy/paste of the first 30 lines; it goes up to a final count of about 460 total occupations. 

 

IPUMS OCCSOC Recode ONET to match
111021 11-1021.00
1110XX 11-1011.00, 11-1011.03,11-1031.00
112011 11-2011.00, 11-2011.01
112020 11-2021.00, 11-2022.00
112031 11-2031.00 
113011 11-3011.00
113021 11-3021.00
113031 11-3021.01, 11-3021.02
113051 11-3051.00, 11-3051.01, 11-3051.02, 11-3051.03, 11-3051.04, 11-3051.05, 11-3051.06
113061 11-3061.00
113071 11-3071.00, 11-3071.01, 11-3071.02, 11-3071.03
113111 11-3111.00
113121 11-3121.00
113131 11-3131.00
119013 11-9013.01, 11-9013.02, 11-9013.03
119021 11-9021.00
119030 11-9031.00, 11-9032.00, 11-9033.00, 11-9039.01, 11-9039.02
119041 11-9041.00, 11-9041.01
119051 11-9051.00
119071 11-9071.00
119081 11-9081.00
119111 11-9111.00
119121 11-9121.00, 11-9121.01, 11-9121.02
119141 11-9041.00
119151 11-9151.00
119161 11-9161.00
119XXX 11-9199.00, 11-9199.01, 11-9199.02, 11-9199.03, 11-9199.04, 11-9199.07, 11-9199.08, 11-9199.09, 11-9199.10, 11-9199.11
131011 13-1011.00
131021 13-1021.00
PaigeMiller
Diamond | Level 26

Okay, go back to what @ballardw said in message 4 of this thread. You can turn these into formats and then run PROC SUMMARY to get the means. Formats will cause PROC SUMMARY to compute means of all the ONET values in a given IPUMS.

 

So, for example, working on the first two lines of your Excel file.

 

proc format;
    value ipumsf '11-1021.00'='111021' '11-1011.00','11-1011.03','11-1031.00' = '110XX';
run;

proc summary data=have;
    class ipums_occsoc;
    format ipums_occsoc ipumsf.;
    var cognitiveindex ...;
    output out=means mean=;
run;

There are automated ways to have the format created from the data read in from Excel, or you could just type it in. If you need to do one of the automated ways, just ask.

 

--
Paige Miller
Diana_AdventuresinSAS
Obsidian | Level 7

@PaigeMiller thanks for your help so far. I tried your method but I'm stuck. I'm new to formats so bear with me. 

 

Question: Does a data set need to exist for proc format to create values? It seems to me that some catalog gets created that simply says all these values will be called this? I believe your last post you were trying to have me import the Excel file from which these formats would be created.

 

Here's the code I tried and the log. I only want to do the 3 occupations until I can see it working. I was hoping to create a new data set with one observation (occupation 110XX and the averaged indexes). Is there any other information I can provide you with that will help?

 

proc format library=work;
   value $occf
   '11-1011.00', '11-1011.03', '11-1031.00' = '110XX';
run;

 

proc summary data=onetlib.onet;
   class occsoc;
   format occsoc occf.;
   var bodypositindex cognitiveindex environindex jobhazardindex manageindex manuphysindex technicalindex workstylesindex                     workvaluesindex;
   output out=onetlib.newonet (drop=_type_ _freq_) mean=;
run;

 

1
NOTE: Libref ONETLIB was successfully assigned as follows:
.
3
4 proc format library=work;
5 value $occf
6 '11-1011.00', '11-1011.03', '11-1031.00' = '110XX';
NOTE: Format $OCCF has been output.
7 run;

NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds


8
9 proc summary data=onetlib.onet;
10 class occsoc;
11 format occsoc occf.;
ERROR: The format OCCF was not found or could not be loaded.
12 var bodypositindex cognitiveindex environindex jobhazardindex manageindex
13 manuphysindex technicalindex workstylesindex workvaluesindex;
14 output out=onetlib.newonet (drop=_type_ _freq_) mean=;
15 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set ONETLIB.NEWONET may be incomplete. When this step was stopped there were
0 observations and 0 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

 

 

art297
Opal | Level 21

No, a dataset doesn't have to exist for a format to be created. Your problem was that you created a character format (i.e., one that begins with a $ and applies to character values), but then tried to apply a numeric format with the same name (which you never created).

 

Just change:

11 format occsoc occf.;

to

format occsoc $occf.;

Art, CEO, AnalystFinder.com

 

 

Diana_AdventuresinSAS
Obsidian | Level 7

 

Thanks @art297! Seems to be working, however, there is an empty obs=1 which I can't figure out. Any idea what that is and can I expect to see it when I do the formats for all the occupations? (If anything, I'll just delete the first observation).

 

 

SAS Output

Obs OCCSOC cognitiveindex manuphysindex workvaluesindex workstylesindex technicalindex manageindex environindex jobhazardindex bodypositindex
1   62.7240 30.7268 16.3287 11.2875 20.9301 9.6196 13.6965 11.1576 12.9805
2 110XX 70.3800 18.0500 22.9967 13.1100 17.1050 14.0050 10.7350 8.4050 11.4400
3 11-10 64.8700 26.2500 23.3300 11.6700 19.2600 13.6200 11.3900 10.5600 13.0300
4 11-20 68.0600 18.3725 21.2640 12.4650 16.0925 12.6575 9.1850 6.9600 10.0400
5 11-30 67.4521 23.9993 20.1900 11.7486 21.2250 13.2214 12.6186 10.2921 13.0050
6 11-31 65.2467 17.9100 22.1100 11.9133 16.1167 13.0767 7.7633 6.8167 9.9433
7 11-90 68.1787 26.5507 20.6980 11.7727 19.8987 12.9013 12.1907 9.9053 11.9520
8 11-91 67.4129 21.3559 20.1441 11.6224 17.6735 12.3988 10.9771 9.1276 11.9012
9 13-10 65.6645 22.2930 17.7835 11.5360 17.2485 10.5700 11.5190 8.8400 11.8075
10 13-11 65.8975 19.8050 19.2508 11.7808 18.1442 10.9083 9.6892 7.2650 10.2525
art297
Opal | Level 21

I haven't seen your data, or all of the specifics of this thread but, if that is the result of your proc summary, that top line may just be a total set of averages. Rerun it again with the way option and see if that gets rid of that top line of data. i.e.:

proc summary data=onetlib.onet nway;
   class occsoc;
   format occsoc occf.;
   var bodypositindex cognitiveindex environindex jobhazardindex manageindex manuphysindex technicalindex workstylesindex                     workvaluesindex;
   output out=onetlib.newonet (drop=_type_ _freq_) mean=;
run;

Art, CEO, AnalystFinder.com

 

PaigeMiller
Diamond | Level 26

Observation 1 from PROC SUMMARY contains the global statistics (in other words, using all data, regardless of OCCSOC value).

 

To remove this from the output, you could use the NWAY option in the PROC SUMMARY statement.

--
Paige Miller
Diana_AdventuresinSAS
Obsidian | Level 7

Hello,

 

I managed to create the desired data set. I have one last question pertaining to a permanent save of outputted data. I thought if I write the libname in "output out=onetlib.onetmerge" in the proc summary that it would create a permanent dataset regardless of whether the formats are permanent (since in proc format I use "library=work;"). When I ask it to proc print the data without a proc format running, it states "ERROR: Format $OCCF not found or couldn't be loaded for variable OCCSOC."

 

I plan to merge this with other data (IPUMS) and I would like to save permanent sets, and have multiple copies in different locations.

 

Thanks so much!!!

Diana

 

Current syntax:

 

libname ONETlib 'C:\Users\..'

 

proc format library=work;
value $occf
'11-1021.00'='111021'
'11-1011.00', '11-1011.03', '11-1031.00'='1110XX'

.

.

'53-7111.00', '53-7121.00'='5371XX';

run;

 

proc summary data=onetlib.onet nway;
   class occsoc;
   format occsoc $occf.;
   var cognitiveindex manuphysindex workvaluesindex workstylesindex technicalindex manageindex environindex jobhazardindex         bodypositindex ;
output out=onetlib.onetmerge (drop=_type_ _freq_) mean=;
run;

 

proc print data=onetlib.onetmerge; run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1401 views
  • 2 likes
  • 5 in conversation