SAS Programming

Hello_there · Posted 12-05-2022 03:17 PM

Hi, I'm working with a table in which i have many visits, and I'm trying to fill in NA (NA) (for mean (sd)) and NA (for median and n) for cell values that are blank.

What is the most efficient way to do this?

data have;
infile datalines dsd dlm=",";
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 
run;

desired output:

Tom · Posted 12-05-2022 03:49 PM

Just use an ARRAY.

data have;
  infile datalines dsd ;
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 

data want;
  set have ;
  array trt [3] ;
  do index=1 to dim(trt);
    if missing(trt[index]) then do;
      if stat in ('median' 'n') then trt[index]='NA';
      if stat in ('mean_std') then trt[index]='NA (NA)';
    end;
  end;
run;

proc print;
run;

Obs    visit    stat        trt1       trt2       trt3       index

 1      001     mean_std    5 (3)      4 (2)      5 (3)        4
 2      001     median      3          3          3            4
 3      001     n           2          4          5            4
 4      002     mean_std    NA (NA)    NA (NA)    NA (NA)      4
 5      002     median      NA         NA         NA           4
 6      002     n           NA         NA         NA           4
 7      003     mean_std    2 (4)      4 (3)      6            4
 8      003     median      1          1          1            4
 9      003     n           2          2          9            4

View solution in original post

PaigeMiller · Posted 12-05-2022 03:22 PM

More context is needed. Why are you doing this? Is it because you want NA in a data set so that you can perform additional programming tasks on it (what programming tasks), or you want NA in a report, or something else?

Is this really the raw data you are working from, or did you run PROC MEANS on some actual data and want this re-arranged as you show?

Context, please!

--
Paige Miller

Hello_there · Posted 12-05-2022 03:24 PM

Hi PaigeMiller,
I ran proc means and i would like it rearranged as shown.

After i had transposed it, there were lots of blank cells that need to be filled in.

For future reference i guess it would be best to specify that this is for a table/report instead of needed in a data set.

PaigeMiller · Posted 12-05-2022 03:29 PM

Please show us the output from PROC MEANS. Or is that what you are showing us already in your first message?

--
Paige Miller

Hello_there · Posted 12-05-2022 03:36 PM

This is my personal computer, so I don't have access to the actual output. I make these mock examples that are representative of the nature of my problem. I make up the data and hardcode it in a data step with the datalines statement.

The issue I'm having is coming up with the most efficient way to fill in the blank cells w/ "NA (NA) "for when stat = "mean_std" and trt1 or trt2 or tr3 is blank
and also fill in "NA" for when stat=median or stat=n and either trt1 or trt2 or trt3 is blank.

I wish i could give you more context, but this is more a data manipulation problem than anything else. Perhaps there's a keyword in PROC Means i don't know that fills 0 for all combinations of treatment and visit, but it appears that proc only counts non missing values in the VAR statement.

PaigeMiller · Posted 12-05-2022 03:40 PM

It's just a data manipulation exercise. I got it.

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.

options missing='NA';

--
Paige Miller

Hello_there · Posted 12-05-2022 03:42 PM

Thanks for the reply and the explanation.

But this is a unique situation where the value can take 1 of 2 possible values for a blank cell, either "NA" for n or median, or "NA (NA)" for mean_std. So this is not quite i was looking for, although this is a useful trick

Tom · Posted 12-05-2022 03:45 PM

@PaigeMiller wrote:

It's just a data manipulation exercise. I got it.

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.
options missing='NA';

That cannot work. The MISSING option can only accept a single character.

PaigeMiller · Posted 12-05-2022 03:45 PM

@Tom wrote:
@PaigeMiller wrote:

It's just a data manipulation exercise. I got it.

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.
options missing='NA';
That cannot work. The MISSING option can only accept a single character.

Okay, I stand corrected, but custom formats would work fine here.

--
Paige Miller

ballardw · Posted 12-05-2022 03:54 PM

Proc tabulate table option MISSTEXT does work though. The Sashelp.class data set has no females age 16 so when you run

proc tabulate data=sashelp.class;
   class sex age;
   var height;
   table age * height*(max min),
         sex
         /misstext='NA'
   ;
run;

The result is

			Sex
			F	M
Age			51.30	57.50
11	Height	Max	51.30	57.50
11	Height	Min	51.30	57.50
12	Height	Max	59.80	64.80
12	Height	Min	56.30	57.30
13	Height	Max	65.30	62.50
13	Height	Min	56.50	62.50
14	Height	Max	64.30	69.00
14	Height	Min	62.80	63.50
15	Height	Max	66.50	67.00
15	Height	Min	62.50	66.50
16	Height	Max	NA	72.00
16	Height	Min	NA	72.00

Hello_there · Posted 12-05-2022 05:31 PM

Do you know if it's possible to output this into a data set? I try use "ods output out=xyz" in a separate line of code, but doesn't work.

ballardw · Posted 12-05-2022 06:19 PM

@Hello_there wrote:
Do you know if it's possible to output this into a data set? I try use "ods output out=xyz" in a separate line of code, but doesn't work. T

There is no ODS OUTPUT option for Proc Tabulate. Possibly because the procedure was designed a long time before ODS and the interface to ODS Output might be difficult to add.

Proc Tabulate supports an OUT=datasetname option. However the structure will be different than the displayed output and may be quite confusing for relative new comers to SAS data sets. Part of the difficulty with Tabulate output is that Proc Tabulate will accept multiple Table statements and it isn't required for any of the tables to have any variables or statistics in common. This will demonstrate what I mean in that regard:

proc tabulate data=sashelp.class out=work.tabout;
   class sex age;
   var height weight;
   table sex,
            height*(mean std)
   ;
   table age,
            weight*(max q3)
   ;
   table sex*(age all='all sex'),
           (height weight)*(n median)
   ;
run;

You will see that there is a variable indicating which table a combination of class and Var variable statistics occurs in, the var variables have the statistic appended to the variable name (odd results with very long variable names from this) and a _type_ variable to indicate the combinations of Class variables. Trying to display that resulting data set in other forms is often a fair amount of work. I'm not saying don't create sets from tabulate, I do when the need arises but that is an experience thing about when it may be useful.

Hello_there · Posted 12-05-2022 06:48 PM

Thanks!

Tom · Posted 12-05-2022 03:49 PM

Just use an ARRAY.

data have;
  infile datalines dsd ;
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 

data want;
  set have ;
  array trt [3] ;
  do index=1 to dim(trt);
    if missing(trt[index]) then do;
      if stat in ('median' 'n') then trt[index]='NA';
      if stat in ('mean_std') then trt[index]='NA (NA)';
    end;
  end;
run;

proc print;
run;

Obs    visit    stat        trt1       trt2       trt3       index

 1      001     mean_std    5 (3)      4 (2)      5 (3)        4
 2      001     median      3          3          3            4
 3      001     n           2          4          5            4
 4      002     mean_std    NA (NA)    NA (NA)    NA (NA)      4
 5      002     median      NA         NA         NA           4
 6      002     n           NA         NA         NA           4
 7      003     mean_std    2 (4)      4 (3)      6            4
 8      003     median      1          1          1            4
 9      003     n           2          2          9            4

Hello_there · Posted 12-05-2022 03:52 PM

Thanks, Tom! This is perfect.

SAS Programming

What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Re: What is the most efficient way to fill in blank cells w/ NA?

Fill in Blanks with 'NA'

Filling blank cells with data till the next filled cell using do loops

Let Copilot be your guide you to efficient coding

Unlock Seamless Efficiency: SAS Viya’s No-Code/Low-Code Experience to ...

Maxims of Maximally Efficient SAS Programmers

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...