Dataset with variable number of columns

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Dataset with variable number of columns

I have a simple dataset.  The variable name for each column is the day the job runs (Vol_Date), and the column itself contains the number of widgets on that day e.g. :

                         Vol_20140719     Vol_20140720     Vol_20140721     Difference

Widget_1               100                    99                    98                           -1

Widget_2               150                    140                   138                         -2

Widget_3               200                    150                   140                        -10

Widget_4               100                    80                    78                           -2

Tomorrow, the date will be 20140722, and the resultant dataset after the job has run for the day must be similar to the above i.e. it will have the number of widgets in stock for 20140719 to 20140722, but this time the difference column will be the volumes for the 20140722 less the volumes for the 20140721. Please help ?


Accepted Solutions
Solution
‎07-21-2014 10:11 AM
Super User
Posts: 5,081

Re: Dataset with variable number of columns

RW brings up a valid point ... it might be easier to restructure your data.  However, you can keep the current structure.  Under that scenario, the variable names change on a daily basis, hence the program changes, hence macro language will be involved.  Here is one way to approach the problem, beginning with finding the names of the variables to use in the calculations.

proc contents noprint data=have out=_contents_ (keep=name);

run;

data _contents_;

   set _contents_;

   up = upcase(name);

   if up =: 'VOL_';

run;

proc sort data=_contents_;

   by descending up;

run;

data _null_;

   set _contents_;

   if _n_=1 then call symputx('latest_date', name);

   else if _n_=2 then call symputx('prior_date', name);

   else stop;

run;

That gives you the two variable names to use in calculations, so a later program could use:

data want;

   set have;

   difference = &latest_date - &prior_date;

run;

Good luck.

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: Dataset with variable number of columns

Hi,

TBH I don't recommend that structure you are using there.  If you think about running it for a while then you could end up with hundreds of variables.  My suggestion would be to re-arrange the dataset to look like this:

Date          Widget1     Widget2     ... Widget4

20140719     100          150               100

20140720        99          140               80

With the above structure you can easily get the difference as _n_-1 value - _n_ value will give you difference.

Basic SAS code to do this (though SQL would be better):

data have;
  attrib Date informat=yymmdd8. format=yymmdd8. Widget1 Widget2 Widget3 Widget4 format=best.;
  infile datalines dlm=',';
  input date widget1-widget4;
datalines;
20140719,100,150,,200,100
20140720,99,140,150,80
20140721,98,138,140,78
;
run;

proc sort data=have;
  by descending date;
run;
data want;
  set have;
  retain lstw1-lstw4;
  if _n_=1 then do;
    lstw1=widget1;
    lstw2=widget2;
    lstw3=widget3;
    lstw4=widget4;
    output;
  end;
  else if _n_=2 then do;
    output;
    date='01JAN1970'd;
    widget1=widget1-lstw1;
    widget2=widget2-lstw2;
    widget3=widget3-lstw3;
    widget4=widget4-lstw4;
    output;
  end;
  else output;
run;

Solution
‎07-21-2014 10:11 AM
Super User
Posts: 5,081

Re: Dataset with variable number of columns

RW brings up a valid point ... it might be easier to restructure your data.  However, you can keep the current structure.  Under that scenario, the variable names change on a daily basis, hence the program changes, hence macro language will be involved.  Here is one way to approach the problem, beginning with finding the names of the variables to use in the calculations.

proc contents noprint data=have out=_contents_ (keep=name);

run;

data _contents_;

   set _contents_;

   up = upcase(name);

   if up =: 'VOL_';

run;

proc sort data=_contents_;

   by descending up;

run;

data _null_;

   set _contents_;

   if _n_=1 then call symputx('latest_date', name);

   else if _n_=2 then call symputx('prior_date', name);

   else stop;

run;

That gives you the two variable names to use in calculations, so a later program could use:

data want;

   set have;

   difference = &latest_date - &prior_date;

run;

Good luck.

Contributor
Posts: 40

Re: Dataset with variable number of columns

Thanks - I tried this proposal this morning and it works perfectly - many thanks !

Trusted Advisor
Posts: 1,204

Re: Dataset with variable number of columns

proc sql;
  select name, count(name) into :list separated by ' ',
         :cnt from dictionary.columns
        where libname='WORK' and memname='HAVE' and name like 'Vol%';
quit;

data want;
set have;
array vol{*} &list;
difference=vol(&cnt)-vol(&cnt-1);
run;

Respected Advisor
Posts: 3,777

Re: Dataset with variable number of columns

How about a nice array?

data vol;
   input widget $ Vol_20140719     Vol_20140720     Vol_20140721;
   array vol
  • vol:;
  •    dif = vol[dim(vol)]-vol[dim(vol)-1];
       cards;
    Widget_1               100                    99                    98        
    Widget_2               150                    140                   138       
    Widget_3               200                    150                   140       
    Widget_4               100                    80                    78        
    ;;;;
       run;
    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 5 replies
    • 230 views
    • 9 likes
    • 5 in conversation