BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ned123
Calcite | Level 5

Hello,

 

I have a table of Producer Price Indices (PPI) in excel(attached). I would like to create a macro to move all the rows into one row, so there is one row with each of the months in chronological order. And I would like to add the year next to each month. I cannot for the life of me figure out how to do this. The problem I'm having is adding the year to each month. If you can think of a better macro or correct mine I would really appreciate it. Here's what I have so far.

 

 

%MACRO PPI_NEW(YEAR, PPI);

PROC SQL;

INSERT into PPI_COMB
(
YEAR&YEAR , /* I am not sure this will work*/
Jan&YEAR , /* I want tohave the fields update with the macro */
Feb&YEAR ,
Mar&YEAR ,
April&YEAR,
May&YEAR ,
June&YEAR ,
July&YEAR ,
Aug&YEAR ,
Sept&YEAR ,
Oct&YEAR ,
Nov&YEAR ,
Dec&YEAR
)
SELECT
(
YEAR,
January,
February,
March,
April,
May,
June,
July,
August,
September,
October,
November,
December
)
FROM PPI_&PPI
WHERE YEAR = &YEAR;

QUIT;

%MEND;


PPI TABLE.png
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Ned123

Below some code demonstrating how you could go about this one. It's eventually not the final solution for you but I hope it will give you some ideas.

From a data organization point of view: It would be best to maintain the data in a long normalized structure as this will keep the number of variables low and stable (=you can write code where you don't have to deal with ever changing variable names).

Using a long structure you can always transpose your variables into a denormalized structure as required for reporting or time series procedures. See sample code below.

/* create sample data */
data pp1 pp2;
  length year jan feb mar apr may jun jul aug sep oct nov dec 8;
  format jan feb mar apr may jun jul aug sep oct nov dec f4.1;
  array months (12) jan feb mar apr may jun jul aug sep oct nov dec;
  do year=2010 to 2015;
    do _i=1 to dim(months);
      months[_i]=round(ranuni(1)*100,0.1);
    end;
    output pp1;
  end;
  do year=2010 to 2015;
    do _i=1 to dim(months);
      months[_i]=round(ranuni(1)*100,0.1);
    end;
    output pp2;
  end;
  drop _:;
  stop;
run;

/* better: combine the data and and store in long normalized structure */
data combined_long;
  /* read the source data */
  length inds $41 source_table $32;
  set pp: indsname=inds;
  source_table=scan(inds,-1);  

  /* re-organize the data in a long structure */
  array months (12) jan feb mar apr may jun jul aug sep oct nov dec;
  format date yymon7. value f4.1;
  do _i=1 to dim(months);
    date=input(cats(vname(months[_i]),put(year,f4.0)),monyy7.);
    value=months[_i];
    output;
  end;
  keep source_table date value;
run;

/* make the boss happy */
proc sort data=combined_long out=sorted_combined_long;
  by source_table date;
run;
proc transpose data=sorted_combined_long out=happyBoss(drop=_:) ;
  id date;
  format date monyy7.;
  by source_table;
  var value;
run;


/*** if it's enough to just combine the source tables in the given structure ***/
data combined_wide;
  length inds $41 source_table $32;
  set pp: indsname=inds;
  source_table=scan(inds,-1);
run;

 

View solution in original post

11 REPLIES 11
Reeza
Super User

What do you want the output to look like?

Why do you want a macro?

 

Post sample data as text into the forum, preferably a data step, but not an image. 

Here's some instructions on how to create sample data.

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Ned123
Calcite | Level 5

Hi Reeza, I just edited my post, sorry about that, I had mistakenly pressed post before I was done...thanks for checking in. 🙂

Reeza
Super User

Why do you want to do this? I can't imagine a good reason to do this...

 

Is this one time thing or do you need some method that can be dynamic and not need to know how many years?

 

The 'dynamic way' would be to transpose to a long format and then transpose back to a wide format, but with the year included inthe label as well. If you use  variable number, ie Y201701 as a variable name, they'll sort automatically by default and you can use the Year Month description as the variable LABEL. I'm not sure it's less code in the end, but it's easier to understand.

 

 

Ned123
Calcite | Level 5

I want to do this because we plan on combining multiple PPI tables (such as the attached) into one dataset/worksheet so we can have just one worksheet with all the different PPI tables so we can search that worksheet instead of having to find the PPI table we want from 1000+ different files.

 

I'm not sure I'm following, I'm still an amateaur and had help creating the Macro I sent, could you please elaborate on your advice?

 

 

Ned123
Calcite | Level 5

What do you mean by dynamic? Does that mean we would have to manually type in the year for every worksheet in order to transpose? is there no macro I can create to do it to each worksheet without having to make any changes to variable names myself?

Reeza
Super User

Post what you want the results to look like. If I understand what you want, this doesn't seem like a good way to store data. You can review the idea of 'tidy data' if you want and keeping data in your column names is also not a good idea. I can make suggestions on how to do things with images. If you want code, please take the time to post sample data. I don't have data to test and I'm not typing out your data from a picture. 

 

 

Ned123
Calcite | Level 5

Reeza - The first attachment (PPI Table) is the sample data, the second attachment (PPI Table results) is what a the dataset would look like after the macro has been applied to multiple raw excel PPI tables, and they have been combined together into one dataset.

I tried attaching the sample data the first time I posted and it wasn't sticking, its working now.

 

I have relayed to my boss your concerns/suggestions and the suggestions in tidy data but he still wants it organized this way.

Patrick
Opal | Level 21

@Ned123

Below some code demonstrating how you could go about this one. It's eventually not the final solution for you but I hope it will give you some ideas.

From a data organization point of view: It would be best to maintain the data in a long normalized structure as this will keep the number of variables low and stable (=you can write code where you don't have to deal with ever changing variable names).

Using a long structure you can always transpose your variables into a denormalized structure as required for reporting or time series procedures. See sample code below.

/* create sample data */
data pp1 pp2;
  length year jan feb mar apr may jun jul aug sep oct nov dec 8;
  format jan feb mar apr may jun jul aug sep oct nov dec f4.1;
  array months (12) jan feb mar apr may jun jul aug sep oct nov dec;
  do year=2010 to 2015;
    do _i=1 to dim(months);
      months[_i]=round(ranuni(1)*100,0.1);
    end;
    output pp1;
  end;
  do year=2010 to 2015;
    do _i=1 to dim(months);
      months[_i]=round(ranuni(1)*100,0.1);
    end;
    output pp2;
  end;
  drop _:;
  stop;
run;

/* better: combine the data and and store in long normalized structure */
data combined_long;
  /* read the source data */
  length inds $41 source_table $32;
  set pp: indsname=inds;
  source_table=scan(inds,-1);  

  /* re-organize the data in a long structure */
  array months (12) jan feb mar apr may jun jul aug sep oct nov dec;
  format date yymon7. value f4.1;
  do _i=1 to dim(months);
    date=input(cats(vname(months[_i]),put(year,f4.0)),monyy7.);
    value=months[_i];
    output;
  end;
  keep source_table date value;
run;

/* make the boss happy */
proc sort data=combined_long out=sorted_combined_long;
  by source_table date;
run;
proc transpose data=sorted_combined_long out=happyBoss(drop=_:) ;
  id date;
  format date monyy7.;
  by source_table;
  var value;
run;


/*** if it's enough to just combine the source tables in the given structure ***/
data combined_wide;
  length inds $41 source_table $32;
  set pp: indsname=inds;
  source_table=scan(inds,-1);
run;

 

Reeza
Super User

@Ned123 I'd probably create both versions, but make sure in the long version you set up the Excel filters, so it's quick to jump to numbers. If you do specific calculations regularly, ie YTD or YoY comparisons then add them into the long file as well. Even if you give your boss the first one, my guess is that the file will come in useful if you need to use it. 

ballardw
Super User

@Fed wrote:

I want to do this because we plan on combining multiple PPI tables (such as the attached) into one dataset/worksheet so we can have just one worksheet with all the different PPI tables so we can search that worksheet instead of having to find the PPI table we want from 1000+ different files.

 

I'm not sure I'm following, I'm still an amateaur and had help creating the Macro I sent, could you please elaborate on your advice?

 


Your use of the work "worksheet" concerns me. Are you planning on exporting this to Excel and then doing your searches? Excel is a very poor choice for searching IMHO.

Below are two different data sets with the same information. Note that you can copy this into your SAS editor and run the code. So you can create data sets. This is what we are talking about for posting data.

Anyway, the second data set is what it appears you are intent on building. The first is one recommendation for an alternative.

data example1;
   informat product $4. date mmddyy10. index best6.;
   input product date index;
datalines;
AAA   01/15/2015 123.4
AAA   02/17/2015 222
AAA   03/11/2015 456
AAA   04/15/2015 323.4
AAA   05/17/2015 422
AAA   06/11/2015 556
AAA   07/15/2015 623.4
AAA   08/17/2015 722
AAA   09/11/2015 856
AAA   10/15/2015 023.4
AAA   11/17/2015 122
AAA   12/11/2015 156
AAA   01/15/2016 193.4
AAA   02/17/2016 292
AAA   03/11/2016 496
AAA   04/15/2016 393.4
AAA   05/17/2016 492
AAA   06/11/2016 596
AAA   07/15/2016 693.4
AAA   08/17/2016 792
AAA   09/11/2016 896
AAA   10/15/2016 093.4
AAA   11/17/2016 192
AAA   12/11/2016 196
;
run;

data example2;
   informat product $4. JAN2015 FEB2015 MAR2015 APR2015 MAY2015 JUN2015 
            JUL2015 AUG2015 SEP2015 OCT2015 NOV2015 DEC2015 JAN2016 FEB2016 
            MAR2016 APR2016 MAY2016 JUN2016 JUL2016 AUG2016 SEP2016 OCT2016 
            NOV2016 DEC2016 best6.;
   input product JAN2015 FEB2015 MAR2015 APR2015 MAY2015 JUN2015 JUL2015 
         AUG2015 SEP2015 OCT2015 NOV2015 DEC2015 JAN2016 FEB2016 MAR2016 
         APR2016 MAY2016 JUN2016 JUL2016 AUG2016 SEP2016 OCT2016 NOV2016 
         DEC2016;
datalines;
AAA 123.4 222 456 323.4 422 556 623.4 722 856 23.4 122 156 193.4 292 496 393.4 492 596 693.4 792 896 93.4 192 196
;
run;


Now when I want to search for records by date I have many ways of manipulating the date values or considering the values of the index.

 

proc sql;
   select * 
   from example1
   where month(date) in (2,3,4);/* all records for Feb, Mar and Apr regardless of year*/
   select *
   from example1
   where date between '01JUN2015'd and '30MAY2016'd;  /* date range*/
   select *
   from example1 
   where index < 400; /* range of the value*/
quit;

Same thing with your data for the first two:

 

proc sql;
   select product,FEB2015, MAR2015, APR2015, ,FEB2016, MAR2016, APR2016 
   from example2
   ;/* all records for Feb, Mar and Apr regardless of year BUT the list of variables has to change with every year in the data set*/
   select product, JUN2015, JUL2015, AUG2015, SEP2015, OCT2015, 
          NOV2015, DEC2015, JAN2016, FEB2016, MAR2016, APR2016,
          MAY2016
   from example2
   ;  /* date range the sheer number of variables to type can become enormous*/
   select  *     /* what goes here if only want some of the values in the end?
                    and again when you need to do this next year the variable list changes*/
   from example2 /* I'm not going to do this completely but it might start by*/
   where JAN2015 < 400 or
         FEB2015 < 400 or
         MAR2015 < 400 or ....

  ; 
quit;

Also the Example1 data set is in the form that the timeseries procedures in SAS/ETS would like them.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2369 views
  • 2 likes
  • 4 in conversation