BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7

i have a data like this 

data client;
input pid county$ disease$;
cards;
123 ABC Chlam
124 DEF Chlam
134 EFG Syph
135 HIJ Gon
209 KLM Chlam
103 ABC Chlam
120 DEF Chlam
131 EFG Syph
133 HIJ Gon
201 KLM Chlam
123 ABC Chlam
101 ABC Chlam
129 ABC Chlam
;
proc freq data=client;
table county;
where disease='Chlam';
run;

i got this after proc freq;

county Frequency Percent Cumulative
Frequency
Cumulative
Percent
ABC 5 55.56 5 55.56
DEF 2 22.22 7 77.78
KLM 2 22.22 9 100.00

But i want like this:

County #of Case range
ABC 3-5
DEF 1-2
KLM 1-2

 Please please get back to me. I am waiting.

7 REPLIES 7
ballardw
Super User

@Dhana18 wrote:

i have a data like this 

data client;
input pid county$ disease$;
cards;
123 ABC Chlam
124 DEF Chlam
134 EFG Syph
135 HIJ Gon
209 KLM Chlam
103 ABC Chlam
120 DEF Chlam
131 EFG Syph
133 HIJ Gon
201 KLM Chlam
123 ABC Chlam
101 ABC Chlam
129 ABC Chlam
;
proc freq data=client;
table county;
where disease='Chlam';
run;

i got this after proc freq;

county Frequency Percent Cumulative
Frequency
Cumulative
Percent
ABC 5 55.56 5 55.56
DEF 2 22.22 7 77.78
KLM 2 22.22 9 100.00

But i want like this:

County #of Case range
ABC 3-5
DEF 1-2
KLM 1-2

 Please please get back to me. I am waiting.


The first thing is to supply a description of all expected possible ranges. For example you only show up to 5. What if the Count is actual 10, 15, 500 or larger in a real case. You should also describe what you might want for no values.

To display different text for a numeric value SAS uses custom formats that you write. Then you apply the format to an appropriate variable as needed.

 

Here a brief example of creating a format and then using a procedure that gives a bit more control of displayed output than Proc Freq where you can specify the format for the statistic.

roc format;
value mynrange
1 - 2 = ' 1-2'
3 - 5 = ' 3-5'
6 -10 = ' 6-10'
11 - high ='11+'
;

proc tabulate data=client;
   class county;
   table county=' ',
         n="# case range"*f=mynrange. 
         /box=County
   ;
run;

In the Value statement of the Proc Format code the -  left of the = is a range indicator, not the hyphen of the displayed values. There are options that let you indicate whether the end points are included or not in the range plus some other keywords allowed for the value list.

Proc tabulate is another reporting procedure that has more control over displayed tables but a different choice of statistics than Proc Freq. The statistic, such as n, followed by ="something" is the label that the statistic will use for the procedure output, the *f=<some format> allows you to specify a format for the specific use of the statistic in that use.

Dhana18
Obsidian | Level 7
Hi,

Thank you for sending me this code. I tried it but did not give me accurate result. Exact range for Syphilis disease is:

0

1-3

4-19

20-49

50-300


jimbarbour
Meteorite | Level 14

OK, so here's some code.  The results are below the code.  Is this what you need?  In the results I have two tables.  The first is not categorized into ranges.  I just put the first table there so you can see the difference that the Proc Format makes when I assign the ranges.  The TITLE function that I used should be replaced by a TITLE (without a %) that is suitable for your SAS environment.

 

Jim

TITLE;
FOOTNOTE;
OPTIONS	VALIDVARNAME	=	ANY;

DATA	Client;
	INPUT	pid	County	$	Disease	$;
DATALINES;
123 ABC Chlam
124 DEF Chlam
134 EFG Syph
135 HIJ Gon
209 KLM Chlam
103 ABC Chlam
120 DEF Chlam
131 EFG Syph
133 HIJ Gon
201 KLM Chlam
123 ABC Chlam
101 ABC Chlam
129 ABC Chlam
;
RUN;

%TITLE(Data - Client);
PROC	FREQ	DATA=Client	NOPRINT;
	TABLE	County	/	Out=Client_Freqs;
	WHERE	Disease='Chlam';
RUN;

%TITLE(Data - Client_Freqs);
PROC	PRINT	DATA=Client_Freqs	NOOBS;
	VAR	County	Count;
RUN;

PROC	FORMAT;
	VALUE	Chlam_Range
		0	 		= 	'0'
		1  - 3 		= 	'1-3'
		4  - 19		= 	'4-19'
		20 - 49		= 	'20-49'
		50 - 300	= 	'50-300'
		301 - HIGH	= 	'>300'
		.			=	'Missing'
		OTHER 	  	= 	'Invalid'
		;
RUN;

%TITLE(Data - Client_Freqs);
OPTIONS	LABEL;
PROC	PRINT	DATA=Client_Freqs	(RENAME=COUNT='# of Case Range'n)	NOOBS;
	FORMAT	'# of Case Range'n	Chlam_Range6.;
	VAR	County	'# of Case Range'n;
RUN;

Results:

jimbarbour_0-1628901822351.png

 

Reeza
Super User
How do you know what the ranges should be? What are the rules for 1-2 3-5? Is it then 6-7 and 8-10 or 6 to 10 or 6+?
Dhana18
Obsidian | Level 7
It is by disease specific range the range I gave just an example.


ballardw
Super User

@Dhana18 wrote:
It is by disease specific range the range I gave just an example.



If you want an actual solution you have to provide the actual rules. I hope my example is enough to get started.

However, SAS formats apply to a variable. So the range using formats would apply regardless of "disease". If the display range has to vary by disease then you must show the rules because that is going to require data step coding and creation of character variable(s) to hold the result.

Reeza
Super User
Ok. Then create a data set in the CNTLIN structure to create a dynamic format for each disease and then apply that as needed.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1868 views
  • 0 likes
  • 4 in conversation