Solved: Re: Using proc freq to create cross tab table with many headings

wklierman · Posted 11-23-2020 02:35 PM

I have a data set and I have 36 counties and 15 CCOs. Would like to use proc freq efficiently. I want a count of covid contacts by county and for the requisite CCO (Coordinated Care Organization). Several counties utilize the same CCO so that is why there are fewer CCOs than counties. This is how I renamed the generic CCO1.....CCO15 fields

Proc Sql noprint;
CREATE TABLE SASCDC_2.Arias_skinny_dataset_for_viz AS
    SELECT DISTINCT Contact_Person_ID,
	                Monitoring_Status,
					Status,
					ZIP_Code,
					County_,
					Designation,
					CCO_1 AS Advanced_Health label='Advanced_Health',
					CCO_2 AS Allcare_CCO label='Allcare_CCO',
					CCO_3 AS Cascade_Health_Alliance label='Cascade_Health_Alliance',
					CCO_4 AS Columbia_Pacific_CCO label='Columbia_Pacific_CCO',
					CCO_5 AS EOCCO_ label='EOCCO_',
                    CCO_6 AS Health_Share_Oregon label='Health_Share_Oregon',
					CCO_7 AS Intercommunity_Health_Network label='Intercommunity_Health_Network',
					CCO_8 AS Jackson_Care_Connect label='Jackson_Care_Connect',
					CCO_9 AS Pac_Source_Central_OR label='Pac_Source_Central_OR',
					CCO_10 AS Pac_Source_Columbia_Gorge label='Pac_Source_Columbia_Gorge',
					CCO_11 AS Pac_Source_Lane label='Pac_Source_Lane',
					CCO_12 AS Pac_Source_Marion_Polk label='Pac_Source_Marion_Polk',
					CCO_13 AS Trillium_Comm_Health_Plan label='Trillium_Comm_Health_Plan',
					CCO_14 AS Umpqua_Health_Alliance label='Umpqua_Health_Alliance',
					CCO_15 AS Yamhill_Comm_Care_Org label='Yamhill_Comm_Care_Org'
	FROM SASCDC_2.Arias_contact_zip_cco_zip
	WHERE Monitoring_Status ne "Marked for deduplication";
quit;

So the proc freq looks like this (with only a few CCos listed)

proc freq  Data = SASCDC_2.Arias_skinny_dataset_for_viz order=data;;
   Tables County_ * Advanced_Health Allcare_CCO Cascade_Health_Alliance      Columbia_Pacific_CCO EOCCO_;
run;

But the problem is I don't want a table for each CCO. I want the count of contacts occurring in a specific county:CCO cell. So ideally I would like my resulting table to look like (illustrative)

Advanced Health CCO …………………. Health Share Oregon …. Yamhill Comm Health

Baker 23 0 0

Hood River 0 100 0

Multnomah 0 1200 0

Polk 0 0 200

….

Washington 0 1550 0

How can I create that table using proc freq? Or can I. This table is a beta version so I don't want to spent a lot of time

(this time around - like proc report or tabulate). This is to give leadership a new slice of information and to assess how much

value there is.

Thank you.

wlierman

ballardw · Posted 11-23-2020 05:51 PM

Text without program means I don't have your data. Only enough records to get two or three of your "cco" would be needed. I am guessing that your data continues to have the text of the name. So perhaps:

Proc Sql noprint;
CREATE TABLE SASCDC_2.Arias_skinny_dataset_for_viz AS
    SELECT DISTINCT Contact_Person_ID,
	                Monitoring_Status,
					Status,
					ZIP_Code,
					County_,
					Designation,
					Coalesce (CCO_1,CCO_2,CCO_3,CCO_4,CCO_5,CCO_6,CCO_7
                         ,CCO_8,CCO_9,CCO_10,CCO_11,CCO_12,CCO_13
                         ,CCO_14,CCO_15) as CCO
	FROM SASCDC_2.Arias_contact_zip_cco_zip
	WHERE Monitoring_Status ne "Marked for deduplication";
quit;

proc report data=SASCDC_2.Arias_skinny_dataset_for_viz ;
   columns county_ CC0;
   define county_ / group;
   define cco / across;
run;

Will give a chance to see what I mean. You could add a nice label for the CCO simply by adding a quoted string after the across such as "CCO Name" or similar.

There are a number of approaches to addressing the way missing values in the body will appear. Easiest might be to add the line: Options missing='0'; just before the proc report and then reset the missing charact to . with Options missing='.'; after. Or if you prefer a blank instead of zero Options missing=' ';

We are getting a default N statistic with this approach.

View solution in original post

ballardw · Posted 11-23-2020 03:15 PM

Some example data would be nice.

One issue is that your data is poorly structured. You would be better off with a single variable like 'CCO' with a VALUE of the text, or similar, that you are assigning to variable names and a value like "count" (may not be needed if you have one record per contact). It is often a bad idea to store actual data in the name of a variable.

Data structured as I suggest would be easy with either Proc Report or Proc Tabulate.

Maybe something like:

Proc report data=SASCDC_2.Arias_skinny_dataset_for_viz;
   columns county Advanced_health Allcare_cco <list all those cco>; 
   define county /group; 
run;

If those CCO are numeric values then you will get a sum per county.

If they are character and one per conact then you will get a count. If something else, then we need to know more of what your actual data looks like.

wklierman · Posted 11-23-2020 05:23 PM

Thank you for the reply. I think I know what you mean. If I had done something like go by field name for each CCO and substitute a 1 each time the name appeared then I could use your suggestion with Proc report. Would that have worked?

I want to be sure that I understand your technique and suggestion, since this is going to come up again.

I have attached 15K rows of a csv file that contains the dataset that I am working with. I would appreciate seeing your steps to clean the data to get it ready for the Proc report code that you suggest.

Thank you for your help.

wlierman

ballardw · Posted 11-23-2020 05:51 PM

Text without program means I don't have your data. Only enough records to get two or three of your "cco" would be needed. I am guessing that your data continues to have the text of the name. So perhaps:

Proc Sql noprint;
CREATE TABLE SASCDC_2.Arias_skinny_dataset_for_viz AS
    SELECT DISTINCT Contact_Person_ID,
	                Monitoring_Status,
					Status,
					ZIP_Code,
					County_,
					Designation,
					Coalesce (CCO_1,CCO_2,CCO_3,CCO_4,CCO_5,CCO_6,CCO_7
                         ,CCO_8,CCO_9,CCO_10,CCO_11,CCO_12,CCO_13
                         ,CCO_14,CCO_15) as CCO
	FROM SASCDC_2.Arias_contact_zip_cco_zip
	WHERE Monitoring_Status ne "Marked for deduplication";
quit;

proc report data=SASCDC_2.Arias_skinny_dataset_for_viz ;
   columns county_ CC0;
   define county_ / group;
   define cco / across;
run;

Will give a chance to see what I mean. You could add a nice label for the CCO simply by adding a quoted string after the across such as "CCO Name" or similar.

There are a number of approaches to addressing the way missing values in the body will appear. Easiest might be to add the line: Options missing='0'; just before the proc report and then reset the missing charact to . with Options missing='.'; after. Or if you prefer a blank instead of zero Options missing=' ';

We are getting a default N statistic with this approach.

Reeza · Posted 11-23-2020 06:00 PM

COALESCE() won't work as it seems there can be multiple CCO per record. It needs to be a transpose and then sql or freq.

wklierman · Posted 11-25-2020 10:58 AM

Thank you for the guidance.

The coalesce function did work. I checked everything and the CCO service areas mapped correctly to county and zipcodes.

Again thank you both ballardw and Reeza for your help.

wlierman

ballardw · Posted 11-25-2020 11:39 AM

@wklierman wrote:

Thank you for the guidance.

The coalesce function did work. I checked everything and the CCO service areas mapped correctly to county and zipcodes.

Again thank you both ballardw and Reeza for your help.

wlierman

The Coalesce function returns the First value of the list that is populated. You may want to double check to see if the result is correct for any record that had 2 CCO.

My data I would probably have transposed to have one record with the text value as CCO originally. Which would then get a more accurate count if there are two or more CCO per record.

Reeza · Posted 11-25-2020 11:40 AM

It will give you the answer if you only want to count the first CCO in each county. Try using CATX() instead and see if the results differ.

CREATE TABLE SASCDC_2.Arias_skinny_dataset_for_viz AS
SELECT DISTINCT Contact_Person_ID,
Monitoring_Status,
Status,
ZIP_Code,
County_,
Designation,
catx(" - ", CCO_1,CCO_2,CCO_3,CCO_4,CCO_5,CCO_6,CCO_7
,CCO_8,CCO_9,CCO_10,CCO_11,CCO_12,CCO_13
,CCO_14,CCO_15) as CCO
FROM SASCDC_2.Arias_contact_zip_cco_zip
WHERE Monitoring_Status ne "Marked for deduplication";
quit;

Reeza · Posted 11-23-2020 03:26 PM

Agreed with BallardW - sample data that shows your structure and expected output would be helpful and this is easier to do if you restructure your data. The benefit of that is that it will become a fully dynamic solution so even if you add CCO or counties it should still work.

Classroom Training Available!