BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Hello_there
Lapis Lazuli | Level 10

Hi, I'm working with a table in which i have many visits, and I'm trying to fill in NA (NA) (for mean (sd)) and NA (for median and n) for cell values that are blank.

 

What is the most efficient way to do this?

data have;
infile datalines dsd dlm=",";
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 
run;

desired output:

Hello_there_0-1670271416215.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just use an ARRAY.

data have;
  infile datalines dsd ;
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 

data want;
  set have ;
  array trt [3] ;
  do index=1 to dim(trt);
    if missing(trt[index]) then do;
      if stat in ('median' 'n') then trt[index]='NA';
      if stat in ('mean_std') then trt[index]='NA (NA)';
    end;
  end;
run;

proc print;
run;
Obs    visit    stat        trt1       trt2       trt3       index

 1      001     mean_std    5 (3)      4 (2)      5 (3)        4
 2      001     median      3          3          3            4
 3      001     n           2          4          5            4
 4      002     mean_std    NA (NA)    NA (NA)    NA (NA)      4
 5      002     median      NA         NA         NA           4
 6      002     n           NA         NA         NA           4
 7      003     mean_std    2 (4)      4 (3)      6            4
 8      003     median      1          1          1            4
 9      003     n           2          2          9            4

View solution in original post

21 REPLIES 21
PaigeMiller
Diamond | Level 26

More context is needed. Why are you doing this? Is it because you want NA in a data set so that you can perform additional programming tasks on it (what programming tasks), or you want NA in a report, or something else?


Is this really the raw data you are working from, or did you run PROC MEANS on some actual data and want this re-arranged as you show?


Context, please!

--
Paige Miller
Hello_there
Lapis Lazuli | Level 10
Hi PaigeMiller,
I ran proc means and i would like it rearranged as shown.

After i had transposed it, there were lots of blank cells that need to be filled in.

For future reference i guess it would be best to specify that this is for a table/report instead of needed in a data set.
PaigeMiller
Diamond | Level 26

Please show us the output from PROC MEANS. Or is that what you are showing us already in your first message?

--
Paige Miller
Hello_there
Lapis Lazuli | Level 10

This is my personal computer, so I don't have access to the actual output. I make these mock examples that are representative of the nature of my problem. I make up the data and hardcode it in a data step with the datalines statement.

The issue I'm having is coming up with the most efficient way to fill in the blank cells w/ "NA (NA) "for when stat = "mean_std" and trt1 or trt2 or tr3 is blank
and also fill in "NA" for when stat=median or stat=n and either trt1 or trt2 or trt3 is blank.

I wish i could give you more context, but this is more a data manipulation problem than anything else. Perhaps there's a keyword in PROC Means i don't know that fills 0 for all combinations of treatment and visit, but it appears that proc only counts non missing values in the VAR statement.

PaigeMiller
Diamond | Level 26

It's just a data manipulation exercise. I got it.

 

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.

 

options missing='NA';

 

--
Paige Miller
Hello_there
Lapis Lazuli | Level 10
Thanks for the reply and the explanation.

But this is a unique situation where the value can take 1 of 2 possible values for a blank cell, either "NA" for n or median, or "NA (NA)" for mean_std. So this is not quite i was looking for, although this is a useful trick
Tom
Super User Tom
Super User

@PaigeMiller wrote:

It's just a data manipulation exercise. I got it.

 

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.

 

options missing='NA';

 


That cannot work.  The MISSING option can only accept a single character.

PaigeMiller
Diamond | Level 26

@Tom wrote:

@PaigeMiller wrote:

It's just a data manipulation exercise. I got it.

 

So important data manipulation concept: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

That's so important, I will say it again: just because you want NAs in a report doesn't mean you have to have NAs in a data set.

 

You can create this table in any of a number of ways (PROC TABULATE, PROC REPORT, maybe even PROC PRINT and the %TABLEN macro), and by setting the proper option, your missings will appear as NA. This can also be done with custom formats. So ... there's no need to create a DATA set with NAs in place of missing values.

 

options missing='NA';

 


That cannot work.  The MISSING option can only accept a single character.


Okay, I stand corrected, but custom formats would work fine here.

--
Paige Miller
ballardw
Super User

Proc tabulate table option MISSTEXT does work though. The Sashelp.class data set has no females age 16 so when you run

proc tabulate data=sashelp.class;
   class sex age;
   var height;
   table age * height*(max min),
         sex
         /misstext='NA'
   ;
run;

The result is

 
  Sex
F M
Age     51.30 57.50
11 Height Max
Min 51.30 57.50
12 Height Max 59.80 64.80
Min 56.30 57.30
13 Height Max 65.30 62.50
Min 56.50 62.50
14 Height Max 64.30 69.00
Min 62.80 63.50
15 Height Max 66.50 67.00
Min 62.50 66.50
16 Height Max NA 72.00
Min NA 72.00

 

Hello_there
Lapis Lazuli | Level 10
Do you know if it's possible to output this into a data set? I try use "ods output out=xyz" in a separate line of code, but doesn't work.
ballardw
Super User

@Hello_there wrote:
Do you know if it's possible to output this into a data set? I try use "ods output out=xyz" in a separate line of code, but doesn't work. T

There is no ODS OUTPUT option for Proc Tabulate. Possibly because the procedure was designed a long time before ODS and the interface to ODS Output might be difficult to add.

Proc Tabulate supports an OUT=datasetname option. However the structure will be different than the displayed output and may be quite confusing for relative new comers to SAS data sets. Part of the difficulty with Tabulate output is that Proc Tabulate will accept multiple Table statements and it isn't required for any of the tables to have any variables or statistics in common. This will demonstrate what I mean in that regard:

proc tabulate data=sashelp.class out=work.tabout;
   class sex age;
   var height weight;
   table sex,
            height*(mean std)
   ;
   table age,
            weight*(max q3)
   ;
   table sex*(age all='all sex'),
           (height weight)*(n median)
   ;
run;
            

You will see that there is a variable indicating which table a combination of class and Var variable statistics occurs in, the var variables have the statistic appended to the variable name (odd results with very long variable names from this) and a _type_ variable to indicate the combinations of Class variables. Trying to display that resulting data set in other forms is often a fair amount of work. I'm not saying don't create sets from tabulate, I do when the need arises but that is an experience thing about when it may be useful.

Tom
Super User Tom
Super User

Just use an ARRAY.

data have;
  infile datalines dsd ;
	input visit $ stat $ trt1 $ trt2 $ trt3 $;
datalines;
001, mean_std, 5 (3), 4 (2), 5 (3)
001, median, 3, 3, 3
001, n, 2, 4, 5
002, mean_std, , , 
002, median, , ,
002, n, , , 
003, mean_std, 2 (4), 4 (3), 6, (1)
003, median, 1, 1, 1
003, n, 2, 2, 9
; 

data want;
  set have ;
  array trt [3] ;
  do index=1 to dim(trt);
    if missing(trt[index]) then do;
      if stat in ('median' 'n') then trt[index]='NA';
      if stat in ('mean_std') then trt[index]='NA (NA)';
    end;
  end;
run;

proc print;
run;
Obs    visit    stat        trt1       trt2       trt3       index

 1      001     mean_std    5 (3)      4 (2)      5 (3)        4
 2      001     median      3          3          3            4
 3      001     n           2          4          5            4
 4      002     mean_std    NA (NA)    NA (NA)    NA (NA)      4
 5      002     median      NA         NA         NA           4
 6      002     n           NA         NA         NA           4
 7      003     mean_std    2 (4)      4 (3)      6            4
 8      003     median      1          1          1            4
 9      003     n           2          2          9            4
Hello_there
Lapis Lazuli | Level 10
Thanks, Tom! This is perfect.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1305 views
  • 5 likes
  • 4 in conversation