- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm working with a table in which i have many visits, and I'm trying to fill in NA (NA) (for mean (sd)) and NA (for median and n) for cell values that are blank.
What is the most efficient way to do this?
data have;
infile datalines dsd dlm=",";
input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , ,
002, median, , ,
002, n, , ,
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
;
run;
desired output:
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just use an ARRAY.
data have;
infile datalines dsd ;
input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , ,
002, median, , ,
002, n, , ,
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
;
data want;
set have ;
array trt [3] ;
do index=1 to dim(trt);
if missing(trt[index]) then do;
if stat in ('median' 'n') then trt[index]='NA';
if stat in ('mean_std') then trt[index]='NA (NA)';
end;
end;
run;
proc print;
run;
Obs visit stat trt1 trt2 trt3 index 1 001 mean_std 5 (3) 4 (2) 5 (3) 4 2 001 median 3 3 3 4 3 001 n 2 4 5 4 4 002 mean_std NA (NA) NA (NA) NA (NA) 4 5 002 median NA NA NA 4 6 002 n NA NA NA 4 7 003 mean_std 2 (4) 4 (3) 6 4 8 003 median 1 1 1 4 9 003 n 2 2 9 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
More context is needed. Why are you doing this? Is it because you want NA in a data set so that you can perform additional programming tasks on it (what programming tasks), or you want NA in a report, or something else?
Is this really the raw data you are working from, or did you run PROC MEANS on some actual data and want this re-arranged as you show?
Context, please!
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I ran proc means and i would like it rearranged as shown.
After i had transposed it, there were lots of blank cells that need to be filled in.
For future reference i guess it would be best to specify that this is for a table/report instead of needed in a data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show us the output from PROC MEANS. Or is that what you are showing us already in your first message?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is my personal computer, so I don't have access to the actual output. I make these mock examples that are representative of the nature of my problem. I make up the data and hardcode it in a data step with the datalines statement.
The issue I'm having is coming up with the most efficient way to fill in the blank cells w/ "NA (NA) "for when stat = "mean_std" and trt1 or trt2 or tr3 is blank
and also fill in "NA" for when stat=median or stat=n and either trt1 or trt2 or trt3 is blank.
I wish i could give you more context, but this is more a data manipulation problem than anything else. Perhaps there's a keyword in PROC Means i don't know that fills 0 for all combinations of treatment and visit, but it appears that proc only counts non missing values in the VAR statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's just a data manipulation exercise. I got it.
So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.
options missing='NA';
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But this is a unique situation where the value can take 1 of 2 possible values for a blank cell, either "NA" for n or median, or "NA (NA)" for mean_std. So this is not quite i was looking for, although this is a useful trick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller wrote:
It's just a data manipulation exercise. I got it.
So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.
options missing='NA';
That cannot work. The MISSING option can only accept a single character.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
@PaigeMiller wrote:
It's just a data manipulation exercise. I got it.
So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.
You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.
options missing='NA';
That cannot work. The MISSING option can only accept a single character.
Okay, I stand corrected, but custom formats would work fine here.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc tabulate table option MISSTEXT does work though. The Sashelp.class data set has no females age 16 so when you run
proc tabulate data=sashelp.class; class sex age; var height; table age * height*(max min), sex /misstext='NA' ; run;
The result is
Sex | ||||
---|---|---|---|---|
F | M | |||
Age | 51.30 | 57.50 | ||
11 | Height | Max | ||
Min | 51.30 | 57.50 | ||
12 | Height | Max | 59.80 | 64.80 |
Min | 56.30 | 57.30 | ||
13 | Height | Max | 65.30 | 62.50 |
Min | 56.50 | 62.50 | ||
14 | Height | Max | 64.30 | 69.00 |
Min | 62.80 | 63.50 | ||
15 | Height | Max | 66.50 | 67.00 |
Min | 62.50 | 66.50 | ||
16 | Height | Max | NA | 72.00 |
Min | NA | 72.00 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Hello_there wrote:
Do you know if it's possible to output this into a data set? I try use "ods output out=xyz" in a separate line of code, but doesn't work. T
There is no ODS OUTPUT option for Proc Tabulate. Possibly because the procedure was designed a long time before ODS and the interface to ODS Output might be difficult to add.
Proc Tabulate supports an OUT=datasetname option. However the structure will be different than the displayed output and may be quite confusing for relative new comers to SAS data sets. Part of the difficulty with Tabulate output is that Proc Tabulate will accept multiple Table statements and it isn't required for any of the tables to have any variables or statistics in common. This will demonstrate what I mean in that regard:
proc tabulate data=sashelp.class out=work.tabout; class sex age; var height weight; table sex, height*(mean std) ; table age, weight*(max q3) ; table sex*(age all='all sex'), (height weight)*(n median) ; run;
You will see that there is a variable indicating which table a combination of class and Var variable statistics occurs in, the var variables have the statistic appended to the variable name (odd results with very long variable names from this) and a _type_ variable to indicate the combinations of Class variables. Trying to display that resulting data set in other forms is often a fair amount of work. I'm not saying don't create sets from tabulate, I do when the need arises but that is an experience thing about when it may be useful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just use an ARRAY.
data have;
infile datalines dsd ;
input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , ,
002, median, , ,
002, n, , ,
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
;
data want;
set have ;
array trt [3] ;
do index=1 to dim(trt);
if missing(trt[index]) then do;
if stat in ('median' 'n') then trt[index]='NA';
if stat in ('mean_std') then trt[index]='NA (NA)';
end;
end;
run;
proc print;
run;
Obs visit stat trt1 trt2 trt3 index 1 001 mean_std 5 (3) 4 (2) 5 (3) 4 2 001 median 3 3 3 4 3 001 n 2 4 5 4 4 002 mean_std NA (NA) NA (NA) NA (NA) 4 5 002 median NA NA NA 4 6 002 n NA NA NA 4 7 003 mean_std 2 (4) 4 (3) 6 4 8 003 median 1 1 1 4 9 003 n 2 2 9 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content