BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am putting together a table of Preferred contact method (email, text, phone) from a survey of contacts for contact tracing.  Then I wish to express the results by the binary result 'Contact Successful'.  This is the code

Proc sql noprint;
CREATE TABLE SASCDC_2.Contact_Reached_by_RE AS
   SELECT Contact_Attempt_Successful,
          Preferred_Method_of_Contact,
          Count(American_Indian_and_Alaska_Nativ) AS _American_Indian_Alaska_Native,
          Count(Asian) AS _Asian,
          Count(Black_or_African_American) AS  _Black__or__African_American,
          Count(Hispanic_or_Latino_a_x) AS _Hispanic_or_Latino_a_x,
          Count(Middle_Eastern_North_African) AS _Middle_Eastern__North_African,
          Count(Native_Hawaiian_and_Pacific_Isla) AS _Native_Hawaiian_and_Pacific_Isl,
          Count(White) AS _White       
   FROM SASCDC_2.Arias_age_groupings
   Group By Contact_Attempt_Successful, Preferred_Method_of_Contact;
quit;

The output doesn't look too bad but I would like to insert a subtotal by preferred_method_of_contact for each contact_attempt_successful

 

The results look like this (this is an abbreviated table):

Contact_Successful  Preferred_Method      AIAN        Asian     Hispanic     White

No                                    Email                      6             11              65            158

No                                    Phone                    17            29          1303            467

 

Would like a subtotal here ========>       23            40          1368            625

 

Yes                                   Email                     16             66           181             656

Yes                                   Phone                    39           112         1799           1145   

 

Would like a subtotal here ========>       55          178         1980           1801     

 

Also is there a trick to position the Yes responses first then the No responses below

them?

 

Thank you for your assistance.

 

wlierman

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.

View solution in original post

8 REPLIES 8
ballardw
Super User

Proc Tabulate likely won't work as the "race" is in multiple variables and the data would likely need to be reshaped to have a single race variable.

wlierman
Lapis Lazuli | Level 10

Hello,

I didn't do a transpose, the output result came from the SQL query set up.

 

I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.

 

IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use.  Could use a little revamp to really be useful "on the fly".

 

Thanks for your response.

 

wlierman

ballardw
Super User

@wlierman wrote:

Hello,

I didn't do a transpose, the output result came from the SQL query set up.

 

I found that after exporting into excel and spending about 15 minutes adjusting column widths adding a percentage calculation that I had the table I wanted.

 

IMO Proc report and tabulate produce reports that can be used but neither are very intuitive to use.  Could use a little revamp to really be useful "on the fly".

 

Thanks for your response.

 

wlierman


The source data you had shown as the "from" in the SQL code makes is more difficult to create reports when an idea like "race" should be a single variable.

 

Proc tabulate data =something;

   class catvar1 catvar2 catvar3;

  table catvar1*catvar2,

          catvar3 *n

  ;

run;

 

would create rows of a table with the catvar1 and catvar2 variable values as row labels and the values of catvar3 as columns.

However when you get to multiple variables that are actually one idea then the user has to provide a lot more work to get things.

Typically the presence of multiple variables such as a half-dozen "race" variables comes from inexperience and I often blame it on too much initial training with spreadsheets. Databases, where SQL reigns, and most analysis software works much better with more normalized data, such as a variable containing "race" information. Or a variable with a date value instead of 100's of columns with the same meaning for different dates.

wlierman
Lapis Lazuli | Level 10
I agree with you wholeheartedly. This data is downloaded daily from Opera which is an epidemiological data base. I was able to winnow the race down a bit. For example, for White there are probably 14 responses that can be given. Don't get me started on the choices under Hispanic Latino/a/x.

Well contact tracing must go on.

wlierman

ballardw
Super User

@wlierman wrote:
I agree with you wholeheartedly. This data is downloaded daily from Opera which is an epidemiological data base. I was able to winnow the race down a bit. For example, for White there are probably 14 responses that can be given. Don't get me started on the choices under Hispanic Latino/a/x.

Well contact tracing must go on.

wlierman


Sounds like a project in need of better data entry rules. I do understand. I have worked with a large nation-wide survey with a "race" question. The first versions had 5 races and an "other" category response. When "other" was the answer then an open-ended response allowing free text came up to capture the "other". Every month for 5 years cleaning data from that survey we had to recode things because the "other" would be "White" or "Black" or one of the other categories. But we had some that indicated people have no clue about "race" including "Catholic" "Irish" and such. My personal favorite was "Heinz 57".

 

Not my contract but someone did not spend enough time up front designing data entry. And then likely compounded it with an approach with "multiple race" entry. I would trace your data back to see if you have individuals marking multiple race responses. In which case that Proc SQl is completely garbage and is resulting in way more individuals than actual responses.

 

Look at the data and see if that is what is going on. Then talk to who ever expects to use the data whether you maybe should be recoding some of the data into a "more than one race" and/or "non-Hispanic White" "non-Hispanic Black" and such.

 

 

wlierman
Lapis Lazuli | Level 10
Thank you ballardw. Good to hear from someone who has experienced this "free-form" survey response.

Thanks again.

wlierman

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1503 views
  • 5 likes
  • 3 in conversation