I am putting together a table of Preferred contact method (email, text, phone) from a survey of contacts for contact tracing. Then I wish to express the results by the binary result 'Contact Successful'. This is the code
Proc sql noprint; CREATE TABLE SASCDC_2.Contact_Reached_by_RE AS SELECT Contact_Attempt_Successful, Preferred_Method_of_Contact, Count(American_Indian_and_Alaska_Nativ) AS _American_Indian_Alaska_Native, Count(Asian) AS _Asian, Count(Black_or_African_American) AS _Black__or__African_American, Count(Hispanic_or_Latino_a_x) AS _Hispanic_or_Latino_a_x, Count(Middle_Eastern_North_African) AS _Middle_Eastern__North_African, Count(Native_Hawaiian_and_Pacific_Isla) AS _Native_Hawaiian_and_Pacific_Isl, Count(White) AS _White FROM SASCDC_2.Arias_age_groupings Group By Contact_Attempt_Successful, Preferred_Method_of_Contact; quit;
The output doesn't look too bad but I would like to insert a subtotal by preferred_method_of_contact for each contact_attempt_successful
The results look like this (this is an abbreviated table):
Contact_Successful Preferred_Method AIAN Asian Hispanic White
No Email 6 11 65 158
No Phone 17 29 1303 467
Would like a subtotal here ========> 23 40 1368 625
Yes Email 16 66 181 656
Yes Phone 39 112 1799 1145
Would like a subtotal here ========> 55 178 1980 1801
Also is there a trick to position the Yes responses first then the No responses below
them?
Thank you for your assistance.
wlierman
Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.
SQL is meant for data manipulation, not for reporting. Use PROC REPORT or PROC TABULATE instead, which also allow you to use the untransposed dataset.
Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.
That's why I referenced the untransposed dataset, as I suppose the current unfavorable layout of the dataset is the result of a premature transpose.
Hello,
I didn't do a transpose, the output result came from the SQL query set up.
I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.
IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use. Could use a little revamp to really be useful "on the fly".
Thanks for your response.
wlierman
@wlierman wrote:
Hello,
I didn't do a transpose, the output result came from the SQL query set up.
I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.
IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use. Could use a little revamp to really be useful "on the fly".
Thanks for your response.
wlierman
The source data you had shown as the "from" in the SQL code makes is more difficult to create reports when an idea like "race" should be a single variable.
Proc tabulate data =something;
class catvar1 catvar2 catvar3;
table catvar1*catvar2,
catvar3 *n
;
run;
would create rows of a table with the catvar1 and catvar2 variable values as row labels and the values of catvar3 as columns.
However when you get to multiple variables that are actually one idea then the user has to provide a lot more work to get things.
Typically the presence of multiple variables such as a half-dozen "race" variables comes from inexperience and I often blame it on too much initial training with spreadsheets. Databases, where SQL reigns, and most analysis software works much better with more normalized data, such as a variable containing "race" information. Or a variable with a date value instead of 100's of columns with the same meaning for different dates.
@wlierman wrote:
I agree with you wholeheartedly. This data is downloaded daily from Opera which is an epidemiological data base. I was able to winnow the race down a bit. For example, for White there are probably 14 responses that can be given. Don't get me started on the choices under Hispanic Latino/a/x.
Well contact tracing must go on.
wlierman
Sounds like a project in need of better data entry rules. I do understand. I have worked with a large nation-wide survey with a "race" question. The first versions had 5 races and an "other" category response. When "other" was the answer then an open-ended response allowing free text came up to capture the "other". Every month for 5 years cleaning data from that survey we had to recode things because the "other" would be "White" or "Black" or one of the other categories. But we had some that indicated people have no clue about "race" including "Catholic" "Irish" and such. My personal favorite was "Heinz 57".
Not my contract but someone did not spend enough time up front designing data entry. And then likely compounded it with an approach with "multiple race" entry. I would trace your data back to see if you have individuals marking multiple race responses. In which case that Proc SQl is completely garbage and is resulting in way more individuals than actual responses.
Look at the data and see if that is what is going on. Then talk to who ever expects to use the data whether you maybe should be recoding some of the data into a "more than one race" and/or "non-Hispanic White" "non-Hispanic Black" and such.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.