Solved: Getting a sub total for Contact Method

wlierman · Posted 10-30-2020 06:43 PM

I am putting together a table of Preferred contact method (email, text, phone) from a survey of contacts for contact tracing. Then I wish to express the results by the binary result 'Contact Successful'. This is the code

Proc sql noprint;
CREATE TABLE SASCDC_2.Contact_Reached_by_RE AS
   SELECT Contact_Attempt_Successful,
          Preferred_Method_of_Contact,
          Count(American_Indian_and_Alaska_Nativ) AS _American_Indian_Alaska_Native,
          Count(Asian) AS _Asian,
          Count(Black_or_African_American) AS  _Black__or__African_American,
          Count(Hispanic_or_Latino_a_x) AS _Hispanic_or_Latino_a_x,
          Count(Middle_Eastern_North_African) AS _Middle_Eastern__North_African,
          Count(Native_Hawaiian_and_Pacific_Isla) AS _Native_Hawaiian_and_Pacific_Isl,
          Count(White) AS _White       
   FROM SASCDC_2.Arias_age_groupings
   Group By Contact_Attempt_Successful, Preferred_Method_of_Contact;
quit;

The output doesn't look too bad but I would like to insert a subtotal by preferred_method_of_contact for each contact_attempt_successful

The results look like this (this is an abbreviated table):

Contact_Successful Preferred_Method AIAN Asian Hispanic White

No Email 6 11 65 158

No Phone 17 29 1303 467

Would like a subtotal here ========> 23 40 1368 625

Yes Email 16 66 181 656

Yes Phone 39 112 1799 1145

Would like a subtotal here ========> 55 178 1980 1801

Also is there a trick to position the Yes responses first then the No responses below

them?

Thank you for your assistance.

wlierman

ballardw · Posted 11-02-2020 11:26 AM

Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.

View solution in original post

Kurt_Bremser · Posted 10-31-2020 02:51 AM

SQL is meant for data manipulation, not for reporting. Use PROC REPORT or PROC TABULATE instead, which also allow you to use the untransposed dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 11-02-2020 11:26 AM

Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.

Kurt_Bremser · Posted 11-02-2020 11:34 AM

That's why I referenced the untransposed dataset, as I suppose the current unfavorable layout of the dataset is the result of a premature transpose.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

wlierman · Posted 11-02-2020 11:59 AM

Hello,

I didn't do a transpose, the output result came from the SQL query set up.

I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.

IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use. Could use a little revamp to really be useful "on the fly".

Thanks for your response.

wlierman

ballardw · Posted 11-03-2020 01:39 PM

@wlierman wrote:

Hello,

I didn't do a transpose, the output result came from the SQL query set up.

I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.

IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use. Could use a little revamp to really be useful "on the fly".

Thanks for your response.

wlierman

The source data you had shown as the "from" in the SQL code makes is more difficult to create reports when an idea like "race" should be a single variable.

Proc tabulate data =something;

class catvar1 catvar2 catvar3;

table catvar1*catvar2,

catvar3 *n

;

run;

would create rows of a table with the catvar1 and catvar2 variable values as row labels and the values of catvar3 as columns.

However when you get to multiple variables that are actually one idea then the user has to provide a lot more work to get things.

Typically the presence of multiple variables such as a half-dozen "race" variables comes from inexperience and I often blame it on too much initial training with spreadsheets. Databases, where SQL reigns, and most analysis software works much better with more normalized data, such as a variable containing "race" information. Or a variable with a date value instead of 100's of columns with the same meaning for different dates.

wlierman · Posted 11-03-2020 04:39 PM

I agree with you wholeheartedly. This data is downloaded daily from Opera which is an epidemiological data base. I was able to winnow the race down a bit. For example, for White there are probably 14 responses that can be given. Don't get me started on the choices under Hispanic Latino/a/x.

Well contact tracing must go on.

wlierman

ballardw · Posted 11-03-2020 05:22 PM

@wlierman wrote:
I agree with you wholeheartedly. This data is downloaded daily from Opera which is an epidemiological data base. I was able to winnow the race down a bit. For example, for White there are probably 14 responses that can be given. Don't get me started on the choices under Hispanic Latino/a/x.

Well contact tracing must go on.

wlierman

Sounds like a project in need of better data entry rules. I do understand. I have worked with a large nation-wide survey with a "race" question. The first versions had 5 races and an "other" category response. When "other" was the answer then an open-ended response allowing free text came up to capture the "other". Every month for 5 years cleaning data from that survey we had to recode things because the "other" would be "White" or "Black" or one of the other categories. But we had some that indicated people have no clue about "race" including "Catholic" "Irish" and such. My personal favorite was "Heinz 57".

Not my contract but someone did not spend enough time up front designing data entry. And then likely compounded it with an approach with "multiple race" entry. I would trace your data back to see if you have individuals marking multiple race responses. In which case that Proc SQl is completely garbage and is resulting in way more individuals than actual responses.

Look at the data and see if that is what is going on. Then talk to who ever expects to use the data whether you maybe should be recoding some of the data into a "more than one race" and/or "non-Hispanic White" "non-Hispanic Black" and such.

wlierman · Posted 11-03-2020 06:39 PM

Thank you ballardw. Good to hear from someone who has experienced this "free-form" survey response.

Thanks again.

wlierman

Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Re: Getting a sub total for Contact Method

Registration is open

SAS Training: Just a Click Away