Help using Base SAS procedures

Proc means summary of "duplicate" values

Reply
Contributor
Posts: 32

Proc means summary of "duplicate" values

I have data (excel file) per country, per product, per month and the quantity. However, I would like to summarize the data per quarter in SAS.

The problem I am having is that if for a country the same quantity is given twice (within the quarter) then PROC MEANS only take one of the two quantities. So I am missing data.
Is there a way to fix this problem with a SAS procedure?
Thank you in advance for your SAS help
coba
Trusted Advisor
Posts: 2,113

Re: Proc means summary of "duplicate" values

The problem is elsewhere in your program. MEANS uses all the data it gets. If the variable in the VAR statement has a missing value, then it does not count toward the mean or other statistics (except N and NMISS).

If you are losing duplicates somewhere, it is often in a FIRST. or LAST. construct in the DATA step or in the DISTINCT option of SQL. Without more information from you (code, log, lst), I'm at a loss to help further.
Contributor
Posts: 32

Re: Proc means summary of "duplicate" values

Probably I am doing something wrong. Before I use the proc means I put my data and quarters (if month in (1,2,3) then qtr=1, etc... Then I summarize them. Should I think of another way to sum my quantities?
Thanks for your help.
Super Contributor
Posts: 281

Re: Proc means summary of "duplicate" values

As Doc said ... unless we see your code, there really isn't a way for us to help figure out why you are having this problem (if you are having a problem at all).
Contributor
Posts: 32

Re: Proc means summary of "duplicate" values

I have something like this:

data work.test1;
set work.test;
if Month in (1,2,3) then m=1;
else if Month in (4,5,6) then m=4;
else if Month in (7,8,9) then m=7;
else if Month in (10,11,12) then m=10;
run;

data work.test2;
set work.test1;
format date date7.;
date=mdy(m,01,year);
run;

proc means nway data=work.test2 missing noprint;
class country date product code ;
var quantity;
output out=work.test3 (drop=_TYPE_ _FREQ_) sum=quantity;
run;
Super Contributor
Posts: 281

Re: Proc means summary of "duplicate" values

I don't see anything in your code that could cause the problem that you spoke of originally: "The problem I am having is that if for a country the same quantity is given twice (within the quarter) then PROC MEANS only take one of the two quantities."

Your code is not causing missing values to appear in the data. So it must be something in the data. Could you post some sample data?

By the way, there is no need for data set test2. It accomplishes little, and could be combined into data set test1.

data work.test1;
set work.test;
if Month in (1,2,3) then m=1;
else if Month in (4,5,6) then m=4;
else if Month in (7,8,9) then m=7;
else if Month in (10,11,12) then m=10;
date=mdy(m,01,year);
format date date7.;
drop m;
run;
N/A
Posts: 0

Re: Proc means summary of "duplicate" values

if taking out steps might remove the problem, try doing the analysis without any of these extra steps that only provide a date at the start of a quarter. PROC MEANS will summarise by quarters, just by providing a suitable format for that class variable[pre] proc means nway data= original missing noprint;
var quantity ;
class country month product code ;
format month yyQ. ;
output out=work.test3( drop= _TYPE_ _FREQ_ ) sum= ;
run ;[/pre] I prefer the yyQ. format but you may prefer alternatives like YYQR and YYQx (with C,D,N,P or S for that x) or QTR without the year, or more exotically, construct your own format, with PROC FORMAT PICTURE...

but I see no reason why proc means would not include all obs it reads (unless your analysis variable QUANTITY is missing). Maybe you should keep _freq_ to check the number of obs in each class group add up to the grand total you expect.

If you must have the values of day, month, year on exactly the start of a quarter, perform that process on the summarised data. It will be quicker because there is less data. A simplistic assignment to transform the month to quarter start date[pre] q_start_date = intnx( 'qtr', month,0 ) ;[/pre]

good luck

PeterC
Contributor
Posts: 32

Re: Proc means summary of "duplicate" values

I do not know what I am doing wrong but the result is for m=1960Q1 for all observations.

I already checked if year and month are numeric values, they are!
I put together this:

data test1;
input Product $ country $ year month quantity;
datalines;
10010000 Brazil 2007 1 100
10010000 Brazil 2007 1 23
10010000 Brazil 2007 3 100
10010000 Brazil 2007 3 360
;

proc means nway data= work.test1 missing noprint;
var quantity ;
class country product year month;
format month YYQ. ; /* with C,D,N,P or S for that x */
output out=work.test2( drop= _TYPE_ _FREQ_ ) sum= ;
run ;

but the result is not what I expect.
Valued Guide
Posts: 2,175

Re: Proc means summary of "duplicate" values

Numbers with values in the range from 1 to 12 (month variable) will format only as 1960Q1 with YYQ. format.
The YYQ format is suitable for a numeric variable which contains a number of days since the beginning of 1960 (a standard SAS date).
I think this standard simplifies data analyses involving dates because a date becomes only one variable and many processes and formats can be (and have been) created to "work with dates".
Unfortunately you have one variable for year and a separate variable for month, which makes your analyses complicated. Easily, you can create a single standard SAS date from these two, with :[pre] anDate = mdy( month,1,year) ;[/pre]That provides a class variable which is suitable for formatting with YYQ.
The anDate class variable replaces month and year.
With some extra data lines to demonstrate more than one output line, here is some alternative code[pre]data test1;
input Product $ country $ year month quantity;
anDate = mdy( month,1,year) ;
datalines;
10010000 Brazil 2007 1 100
10010000 Brazil 2007 1 23
10010000 Brazil 2007 3 100
10010000 Brazil 2007 4 360
10010000 Brazil 2007 6 360
10010000 Brazil 2007 10 360
;

proc means nway data= work.test1 missing noprint;
var quantity ;
class country product anDate;
format anDate YYQ. ;
output out=work.test2 sum= ;
run ;[/pre]
My results from a PROC PRINT like: [pre]ods listing ;
title YYQ 'demo';
option nodate nonumber ls=70 nocenter ;
proc print ;
id country product anDate;
var quantity _freq_ ;
sum quantity _freq_ ;
format anDate YYQ. ;
run;[/pre]follow[pre]YYQ demo

country Product anDate quantity _FREQ_

Brazil 10010000 2007Q1 223 3
Brazil 10010000 2007Q2 720 2
Brazil 10010000 2007Q4 360 1
======== ======
1303 6[/pre]

PeterC
N/A
Posts: 0

Re: Proc means summary of "duplicate" values

coba,

You are formatting a month number (1, 2, etc) using a format designed for serial dates. Serial date 1 is 1st or 2nd Jan 1960, and 3 is 3rd or 4th Jan 1960, both of these are in Q1 of 1960, so SAS is doing what you ask - it doesn't know to join up the year and month variables.

data test1;
input Product $ country $ year month quantity;
/* I've added a date that is a composite of the year and month */
dt = input('01'||put(month, z2.)||put(year, 4.), ddmmyy8.);
/* I've added a couple of lines of data to go into the following quarter */
datalines;
10010000 Brazil 2007 1 100
10010000 Brazil 2007 1 23
10010000 Brazil 2007 3 100
10010000 Brazil 2007 3 360
10010000 Brazil 2007 4 100
10010000 Brazil 2007 5 23
;
run;

proc means nway data= work.test1 missing noprint;
var quantity ;
/* Use the date variable in the class statement */
class country product dt;
/* Format the date variable */
format dt YYQ. ; /* with C,D,N,P or S for that x */
output out=work.test2( drop= _TYPE_ _FREQ_ ) sum= ;
run ;

Output:

Obs country Product dt quantity

1 Brazil 10010000 2007Q1 583
2 Brazil 10010000 2007Q2 123

Is this what you were hoping for?

ProcMe
Contributor
Posts: 32

Re: Proc means summary of "duplicate" values

That's great!
Yes, that is what I was looking for. (I must confess that my sas skills are rather limited!)
I would like to have my date on date7. format I know the long way already (by separating the year and the quarter and then joining then back) , how could I accomplish that in a short way?
Thank you very much.
Contributor
Posts: 32

Re: Proc means summary of "duplicate" values

do not worry about my date format, I managed
Ask a Question
Discussion stats
  • 11 replies
  • 719 views
  • 0 likes
  • 5 in conversation