BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mayasak
Quartz | Level 8

I have a dataset with a big number of variables and I need to find counts of values of 'NT', '0', and ' ', in each of these variables. I've used this so far for one variable at a time :

proc freq data = Gn;
tables /*hospital * amikacin_it/norow nocol nopercent;
where amikacin_it in ('NT', '0', ' ');
run;

proc freq data = Gn;
tables /*hospital * ampicillin_it/norow nocol nopercent;
where ampicillin_it in ('NT', '0', ' ');
run;

......so on and so forth using other variables. 

But this takes forever and gives a huge number of tables.

I'm trying to do this using a macro but have no idea how to proceed.

I really appreciate any help.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

It might help to show what you expect the output for a given example input set (hint hint) would look like.

 

Since you don't show anything resembling an output I am not sure what you may expect.

 

My approach to creating a report about such:

data example;
   infile datalines dlm=',' dsd missover;
   input hosp $ var1 $ var2 $ var3 $ ;
datalines;
GG,NT,A,B
GG,NT,0,
GG,NT,,0
CC,0,NT,0
CC,B,NT
GG,0,NT,0
GG,B,NT
CC,0,NT,0
CC,B,NT
CC,NT,A,B
GG,,NT
CC,,NT,0
CC,B,NT
CC,NT,A,B
CC,0,A,B
CC,c,,B
CC,c,A,
;

data reshape;
   set example;
   array v (*) var1 var2 var3;
   do i=1 to dim(v);
      varname = vname(v[i]);
      value = v[i];
      if value in ('NT' '0' '') then output;
   end;
   keep hosp varname value;
run;

Proc tabulate data=reshape;
   class hosp varname;
   class value/missing;
   table hosp,
         varname=' '*value=' '*n='Count'
         /misstext='0'
   ;
run;

The first data step is to provide something with a variety of values.

The second reshapes the data so that you only have three variables to worry about, the hosp, varname containing the name of the original variable and the value of the variable. The second data step besides reshaping only writes out the values stated to be of interest. Some uncommon items: Use of the Vname function to return the name of a variable from an array element. If you have not seen the IN operator it is a way to examine a variable to see if it has any of the listed values.

 

Note: If your variables are a mix of data types you can't mix them in an array definition and a different approach would be needed depending on what the final result needs to look like.

View solution in original post

9 REPLIES 9
Astounding
PROC Star
If you are willing to add a 4th category (call it "other"), you can do this with a short program that doesn't require macros. Does that sound promising?
mayasak
Quartz | Level 8

I'm not sure how this works. Does this cover all other variables? I'm open to any short program, doesn't have to be Macro.

Astounding
PROC Star
It would cover any variables you list in the TABLES statement. You would have the option to use this trick if you want all character variables:
tables _character_;

There are other tricks that might apply if you want nearly all character variables.
mayasak
Quartz | Level 8

This sounds good but how to get only freqs for only "NT", "0" and " " values (since there are lots of values in each variable) I only want those with no tests done.

ballardw
Super User

It might help to show what you expect the output for a given example input set (hint hint) would look like.

 

Since you don't show anything resembling an output I am not sure what you may expect.

 

My approach to creating a report about such:

data example;
   infile datalines dlm=',' dsd missover;
   input hosp $ var1 $ var2 $ var3 $ ;
datalines;
GG,NT,A,B
GG,NT,0,
GG,NT,,0
CC,0,NT,0
CC,B,NT
GG,0,NT,0
GG,B,NT
CC,0,NT,0
CC,B,NT
CC,NT,A,B
GG,,NT
CC,,NT,0
CC,B,NT
CC,NT,A,B
CC,0,A,B
CC,c,,B
CC,c,A,
;

data reshape;
   set example;
   array v (*) var1 var2 var3;
   do i=1 to dim(v);
      varname = vname(v[i]);
      value = v[i];
      if value in ('NT' '0' '') then output;
   end;
   keep hosp varname value;
run;

Proc tabulate data=reshape;
   class hosp varname;
   class value/missing;
   table hosp,
         varname=' '*value=' '*n='Count'
         /misstext='0'
   ;
run;

The first data step is to provide something with a variety of values.

The second reshapes the data so that you only have three variables to worry about, the hosp, varname containing the name of the original variable and the value of the variable. The second data step besides reshaping only writes out the values stated to be of interest. Some uncommon items: Use of the Vname function to return the name of a variable from an array element. If you have not seen the IN operator it is a way to examine a variable to see if it has any of the listed values.

 

Note: If your variables are a mix of data types you can't mix them in an array definition and a different approach would be needed depending on what the final result needs to look like.

mayasak
Quartz | Level 8
data example;
   infile datalines dlm=',' dsd missover;
   input hosp $ org $ ampicillin $ penem $ amikacin $ ;
datalines;
GG,Strep,NT,10,35
GG,Acine,NT,0,
GG,bacter,,0
CC,bacter,NT,0
CC,kleb,34,NT
GG,kleb,0,NT,0
GG,ecoli,12,NT
CC,strep,0,NT,0
CC,ecoli,377,NT
CC,acine,NT,4,29
GG,citro,,NT
CC,citro,NT,0
CC,amin,45,NT
CC,protius,NT,99,55
GG,protius,0,67,188
CC,serratia,4,,74
GG,serratia,8,102,
; 

Hi ballardw,

I added a sample of the dataset. The code that you provided works perfectly but I was just wondering if I can get variable values for each hospital and each org as well. For example count of 'NT' in 'penem' for 'CC' and 'ecoli', count of 'NT' in 'penem' for 'CC' and 'Strep'...

Thanks a lot for your help.  

  

Ksharp
Super User
data example;
   infile datalines dlm=',' dsd missover;
   input hosp $ org $ ampicillin $ penem $ amikacin $ ;
datalines;
GG,Strep,NT,10,35
GG,Acine,NT,0,
GG,bacter,,0
CC,bacter,NT,0
CC,kleb,34,NT
GG,kleb,0,NT,0
GG,ecoli,12,NT
CC,strep,0,NT,0
CC,ecoli,377,NT
CC,acine,NT,4,29
GG,citro,,NT
CC,citro,NT,0
CC,amin,45,NT
CC,protius,NT,99,55
GG,protius,0,67,188
CC,serratia,4,,74
GG,serratia,8,102,
; 

proc sql;
create table want as
select hosp ,org,
sum(ampicillin in ('NT', '0', ' ')) as ampicillin,
sum(penem in ('NT', '0', ' ')) as penem,
sum(amikacin in ('NT', '0', ' ')) as amikacin
 from example
  group by  hosp ,org;
quit;
ballardw
Super User

@mayasak wrote:
data example;
   infile datalines dlm=',' dsd missover;
   input hosp $ org $ ampicillin $ penem $ amikacin $ ;
datalines;
GG,Strep,NT,10,35
GG,Acine,NT,0,
GG,bacter,,0
CC,bacter,NT,0
CC,kleb,34,NT
GG,kleb,0,NT,0
GG,ecoli,12,NT
CC,strep,0,NT,0
CC,ecoli,377,NT
CC,acine,NT,4,29
GG,citro,,NT
CC,citro,NT,0
CC,amin,45,NT
CC,protius,NT,99,55
GG,protius,0,67,188
CC,serratia,4,,74
GG,serratia,8,102,
; 

Hi ballardw,

I added a sample of the dataset. The code that you provided works perfectly but I was just wondering if I can get variable values for each hospital and each org as well. For example count of 'NT' in 'penem' for 'CC' and 'ecoli', count of 'NT' in 'penem' for 'CC' and 'Strep'...

Thanks a lot for your help.  

  


If you want hosp  and org in the output pretty much  every place my example has "hosp" you want "hosp org" except in the table statement where you want Hosp*org to nest values of org inside Hosp.

 

Hint: try stuff. Get errors. Try to fix the errors. When you can't then show us what you tried and where you need help.

 

mayasak
Quartz | Level 8
Thank you ballardw

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2009 views
  • 0 likes
  • 4 in conversation