Statistical Procedures

geneshackman · Posted 10-19-2022 07:00 PM

I'd like to know if it's possible to add in variables during proc summary. At present, I have two steps.

First is getting a summary for ROS (Rest Of State), that is, adding up all the values for all the ROS counties (cos means county), that is, adding up number of thingies in each county, so I can get total number of thingies for all the ROS counties together.

Second, in a data step, for the row of total of all these, I'm assigning cos to be 888.

Can I get this summary row cos to = 888 in proc summary?

/* calculating ROS */
proc summary data=black nway;
class region;
var total underpov;
output out=BlackRos0 (drop=_:) sum=;
where cos in (1:57);
run;

/* assigning "888" to the cos */

data blackros; set blackros0;
cos = 888;
run;

Tom · Posted 10-21-2022 10:30 AM

I would recommend using the multilabel format method. Ignore the pre calculated summary rows (or at least recalculate them and QC the result against the published numbers).

proc format ;
  value cos (multilabel)
  1-62 = 'NYS'
  1-57 = 'ROS'
  58-62 = 'NYC'
  ;
run;
proc summary data=have nway ;
  class cos / mlf ;
  format cos cos.;
  var total underpov;
  output out=want sum=;
run;
proc print data=want;
run;

View solution in original post

PaigeMiller · Posted 10-19-2022 07:22 PM

On the one hand, you say you want the summary row to have cos=888, but then your code assigns 888 to all rows. Could you please clarify?

PROC SUMMARY does not allow assignment of specific values to specific fields. Could you explain more why you want 888 in the output (either in one row or all rows)??? WHere does the 888 come from, anyway? Is it somehow in the data set named black?

--
Paige Miller

geneshackman · Posted 10-19-2022 07:42 PM

HI Paige

The proc summary creates a summary data set called BlackRos0. That working file has one row, which is the summary of adding together all the counties. I want to assign "888" to that row.

However, "PROC SUMMARY does not allow assignment of specific values to specific fields." Okay, that's what I wanted to know. I wasn't sure whether it did.

Thanks

PaigeMiller · Posted 10-20-2022 08:45 AM

@geneshackman wrote:

The proc summary creates a summary data set called BlackRos0. That working file has one row, which is the summary of adding together all the counties. I want to assign "888" to that row.

Unless there is only one value for the CLASS variable named REGION, then BlackRos0 has more than one row. But this does not explain what you are doing or why you need 888 and where does the 888 come from (all things I asked you to provide), so that perhaps we can figure out another way to get what you want.

Most people here will not download and open Microsoft Office files, as they can be security threats. Please provide (a portion of) your data as working SAS data step code, which you can type in yourself or follow these instructions.

--
Paige Miller

geneshackman · Posted 10-20-2022 08:56 AM

Here are a few lines from the data set. See the explanation of these columns in my response to Reeza. The bottom of this sample has one row for New York State (cos = 999) and one row for New York City (cos = 102). I want to add one row for ROS (cos = 888)

cos	total	underpov	region
1	8913	2910	ROS
2	15	2	ROS
58	120347	43223	NYC
3	2923	1423	ROS
4	317	106	ROS
5	338	148	ROS
22	1434	459	ROS
59	167871	43213	NYC
23	66	8	ROS
24	206	114	ROS
999	647954	182762	NYS
102	420784	117128

PaigeMiller · Posted 10-20-2022 09:30 AM

Why is COS even needed if you have Region? Doesn't one imply the other?

--
Paige Miller

geneshackman · Posted 10-20-2022 09:49 AM

I am going to use cos as the primary code for county. cos is unique to each county. As you can see, all counties in NYC have "NYC" for region, and all counties not in NYC have "ROS" for region. So region is not unique to each county. And ROS and NYC does not uniquely identify the rows which are summaries for the state and for each region. The cos codes do.

ballardw · Posted 10-19-2022 08:15 PM

Define the values that are "ROS" and for which variable.

I suspect that you can use a format to get that, possibly applied to County, which you would add to the Class statement and remove from the Where so you can see the formatted values.

Maybe this example with a data set you should have available will demonstrate:

proc format ;
value ages
11,12,13 ='11-13'
14-high = '>14';
run;

proc summary data=sashelp.class nway;
   class sex age;
   format age ages.;
   var height;
   output out=example max=;
run;

The format groups all the ages 14 and greater into one report category leaving other single ages in the output. This would be similar to what I think you are doing with "cos". The 14 and older are the "rest of the ages" and the 11 -13 like your Where values.

I suspect if the above example comes close that if you

1)remove Nway

2)leave the _type_ variable in the data you will see a row that has the desired summary. I can't tell which you might want.

Your data step code would then assign that value to COS based on the _type_ value. You might also only keep some of the _type_ values.

Or maybe a Multilabel format:

proc format ;
value otherages (multilabel)
low-high ='Everything'
11 - 13 ='11-13'
14 - high='>14';
run;

proc summary data=sashelp.class nway;
   class sex age/mlf;
   format age otherages.;
   var height;
   output out=example2 max=;
run;

Proc Means/summary is one of the few procedures that can use the multilabel formats to differentiate overlapping groups.

Really, actual data and the desired result would make this go much faster instead of making us guess your intent.

geneshackman · Posted 10-20-2022 08:42 AM

Please see my response to Reeza, with more details and an attached data set.

geneshackman · Posted 10-20-2022 10:13 AM

Here is the data set in csv if that's more acceptable.

ballardw · Posted 10-20-2022 10:22 AM

@geneshackman wrote:

Here is the data set in csv if that's more acceptable.

Not really. WE have to make decisions as to how to read the file and our choices may not match yours.

So you should include the code used to read that file into a data set.

OR:

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Reeza · Posted 10-19-2022 08:31 PM

My question is why?

To have the sort order correct when you add it into a table is my guess but there are other ways to get that table/output generated if you show what you're trying to do. But the way you have above definitely works.

geneshackman · Posted 10-20-2022 08:40 AM

Hi all. Thanks for the many suggestions and questions. The original data was from the ACS, b17020 a-i, poverty status by age, one table for each race. This is public domain data. I'm attaching here a modified data set for b17020b, Black or African American Alone. The columns in this data set are:

cos = the code we use for county

total = total number of children under 18 (sum of ages under 6 + age 6-11 + ages 12-17

underpov = total number of children under 18 below poverty level

region = I assigning counties to either NYC or ROS (Rest Of State - New York State excluding New York City)

I had downloaded one data file with data by county, and another file for NY State, and another file for NY City, so this data set has all three. But the Census has no category for ROS, so I wanted to make a row for total and underpov for ROS.

Our codes for NYS = 999, code for NYC = 102, and code for ROS = 888.

I hope this helps.

ballardw · Posted 10-20-2022 10:23 AM

@geneshackman wrote:

Hi all. Thanks for the many suggestions and questions. The original data was from the ACS, b17020 a-i, poverty status by age, one table for each race. This is public domain data. I'm attaching here a modified data set for b17020b, Black or African American Alone. The columns in this data set are:

cos = the code we use for county

total = total number of children under 18 (sum of ages under 6 + age 6-11 + ages 12-17

underpov = total number of children under 18 below poverty level

region = I assigning counties to either NYC or ROS (Rest Of State - New York State excluding New York City)

I had downloaded one data file with data by county, and another file for NY State, and another file for NY City, so this data set has all three. But the Census has no category for ROS, so I wanted to make a row for total and underpov for ROS.

Our codes for NYS = 999, code for NYC = 102, and code for ROS = 888.

I hope this helps.

Note that Excel, especially if the file actually started as CSV is worse. If you open a CSV in Excel and then save it to XLSX it will change values. The choices of how to read it still apply.

geneshackman · Posted 10-20-2022 10:40 AM

"If you open a CSV in Excel and then save it to XLSX it will change values."

Really? I didn't know that. What values will change? I'm sure you don't mean it will change a 1 to a 2. Perhaps this is about blanks, missings, like that?

Statistical Procedures

adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Re: adding variables in proc summary

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...