BookmarkSubscribeRSS Feed
apolitical
Obsidian | Level 7

I have a dataset with 2 variables, VAR1 and VAR2. Say I want to do a proc freq or proc means on VAR2, conditional on VAR1 being a particular value ("where VAR1=XXX" statement). I want to do this process by looping through mean, minimum, maximum and median values of VAR1 found in the data (I already have all these key values saved in a dataset "var1_key_values" below). I also want to be able to grab the "mean/median/maximum/minimum" text and use them as excel tab names for outputting. How do I write a code for this? VAR1 and VAR2 are both numeric if that matters. Thanks.

 

dataset "var1_key_values" looks like this:

 

stat               var1

mean            x

median         y

maximum     z

minimum      w

 

 

 

16 REPLIES 16
Reeza
Super User

I think you need to explain more clearly what you have and what you want. 

 

If you already have the stats, what does the first part of the question have to do with the rest?

apolitical
Obsidian | Level 7
I apologize for the lack of clarity. So I already have the dataset "var1_key_values" as shown above. I would like to do a:

proc freq data=dataset;
table VAR2 / out=freq_out_mean(median,maximum,minimum);
where VAR1=x/y/z/w;
run;

Instead of doing this 4 times by specifying mean/median/maximum/minimum and x/y/z/w, I want to do them in a loop. Does that clarify things a bit?
Reeza
Super User

BY variables.

I'm assuming your code is a sketch because it wouldn't work as posted. 

I'm also betting that whatever you're trying to do, you're doing it in a very difficult and roundabout manner. It may be worth starting from scratch and explaining what you're trying to do.

 

Otherwise, look into BY groups. To export to Excel with specific tab names, look at ODS TAGSETS.EXCELXP and/or ODS EXCEL depending on your version. You can create the tab names based on a variable value. See the examples

 

proc freq data=dataset;

BY VAR1;
table VAR2 / out=freq_out_mean(median,maximum,minimum); *<-wrong, but not sure what you're trying to do;

run;

apolitical
Obsidian | Level 7

Ok let me restart from sratch. I have a "raw" dataset like this:

VAR1   VAR2

1          5

1          7

2          9

2          0

3          2

3         5

 

At the end, I would like to have an excel file that has 3 tabs:

tab "VAR1=1 (min)" containing the proc freq output of VAR2 on the subset of the data where VAR1=1 (first 2 obs of raw);

tab "VAR1=2 (medium)" containing the proc freq output of VAR2 on the subset of the data where VAR1=2 (middle 2 obs of raw);

tab "VAR1=3 (max)" containing the proc freq output of VAR2 on the subset of the data where VAR1=3 (last 2 obs of raw);

 

How can i do this by not having to specifying the numeric values taken on by VAR1 3 times? As you suggested, using a by statement could get me what I want, but I would like to be able to separate the output into 3 tabs when outputting, and naming the 3 tabs with the associated values of VAR1. Basically my trouble is how do you grab the value of VAR1 and feed that into the excel tab name?

Reeza
Super User

Here's a starting point, I'm using SAS 9.4M3 which has ODS EXCEL. If you don't, switch it to ODS TAGSETS.EXCELXP. 

 

proc format;
	value group_fmt
		1 = 'Min'
		2 = 'Med'
		3 = 'Max';
run;

data have;
	input VAR1   VAR2;
	cards;
1          5
1          7
2          9
2          0
3          2
3         5
;

proc sort data=have;
	by var1;
run;

ods excel file='multisheet.xlsx' style=statistical
	options( sheet_interval='bygroup');

proc freq data=have;
	by var1;
	format var1 group_fmt.;
run;

ods excel close;

Original code modified from the documentation here, Data Driven Worksheets

https://support.sas.com/rnd/base/ods/odsmarkup/excelxp_demo.html#data

apolitical
Obsidian | Level 7
that works. but, is there a way to do the first part (proc format) automatically instead of spelling everything out? let's say i don't even need the min/median/max any more, i just need to loop through all unique values of VAR1, and have those values appear in the excel tabs. VAR1 has only 3 unique values in this case, but what if I have a large dataset with hundreds of unique values that it's infeasible to do it all by hand?
Reeza
Super User

The proc format only controls the display of the variable as max/medium/min. If you don't want that remove that step entirely and the corresponding FORMAT statement, assuming your data is representative. It doesn't matter how many levels there are as long as it's under the limit of the excel spreadsheet. 

 

If you have a dataset that drives the labeling of 1-min, 2-median and 3-max then you can also automate that step using a CNTLIN data set with PROC FORMAT. 

 

Other than the format you'll notice NOTHING is hardcoded with a value.

 

apolitical
Obsidian | Level 7
That works. What if, instead of doing a simple proc freq, I'm interested in finding a number of percentile values and outputting them? I tried the following piece of code using your structure, it doesn't seem to work. What am I doing wrong?

ods excel file="multisheet.xlsx" style=statistical
options( sheet_interval='bygroup');

proc stdize data=have
pctlpts = 0, 1, 5, 10, 25, 50, 75, 90, 95, 99, 100;
by var1;
var var2;
run;

ods excel close;
Reeza
Super User

Proc STDIZE doesn't create any output. 

 

ODS EXCEL redirects any output to Excel, but since there's no output there's no file. 

 

It creates a dataset, Data1, that you could then PROC PRINT with a BY statement. You didn't specify an output dataset either, not sure why.

 

Using STDIZE with your sample data also doesn't make sense. 

 

I suggest you read through the resources link I included in the initial answer to see how to control the output and how it works. 

apolitical
Obsidian | Level 7
I tried this by specifying an output file, but still doesn't work.

proc stdize data=have outstat=stats
pctlpts = 0, 1, 5, 10, 25, 50, 75, 90, 95, 99, 100;
by var1;
var var2;
run;

If I don't use proc stdize, how can I achieve what I want to do, i.e. get the percentiles and save them? Proc univariate also gives no success.
Reeza
Super User

Previous answer

It creates a dataset, Data1, that you could then PROC PRINT with a BY statement. You didn't specify an output dataset either, not sure why.

 

 

ods excel file='multisheet.xlsx' style=statistical
      options( sheet_interval='bygroup');

*creates no output to output window, stats stored in WANT dataset instead;
   proc stdize data=have outstat=want
pctlpts = 0, 1, 5, 10, 25, 50, 75, 90, 95, 99, 100;
by var1;
var var2;
run;

*Print WANT dataset with groups as desired; proc print data=want; by var1; run; ods excel close;

 

Reeza
Super User

Here's a well written intro to ODS Tagsets. I'm using ODS Excel because it creates a native Excel file, Tagsets creates an XML file. This has more to do with the changes in SAS and Excel over the last 5 years. Unfortunately not all Tagsets features translate directly to ODS Excel but for what you're doing you should be fine.

 

http://www.lexjansen.com/wuss/2012/149.pdf

 

In general, any feature you're looking for can be found here:

Look up the feature in the first column, then the last column tells you what paper and page has an example related to that topic. 

http://www.sas.com/events/cm/867226/ExcelXPPaperIndex.pdf

apolitical
Obsidian | Level 7

So the following code does what I want to do. Thanks Reeza for your patience.

 

%macro loop(dset);

proc sql;
select distinct var1 into: var1_value separated by '*'
from &dset.;
select count(distinct var1) into: num
from &dset.;
quit;

%do i = 1 %to &num.;
proc stdize data=&dset. outstat=var2_stats
pctlpts = 0, 1, 5, 10, 25, 50, 75, 90, 95, 99, 100;
where var1 = %scan(&var1_value.,&i.,*);
var var2;
run;

proc export 
data=var2_stats
outfile="multisheet" 
dbms=xlsx replace; 
sheet="var1=%scan(&var1_value.,&i.,*)";
run;

%end;

%mend loop;
%loop(dset=have);
Reeza
Super User

@apolitical wrote:

So the following code does what I want to do. Thanks Reeza for your patience.

 

Out of curiosity why did you choose to go with a macro versus a non-macro solution?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2777 views
  • 1 like
  • 2 in conversation