Hi Experts,
Been trying to transpose (but by using arrays) my dataset below;
acct | date | seq | paid | cost |
123 | Jan-10 | 0 | 0 | 0 |
123 | Feb-10 | 0 | 0 | 0 |
123 | Mar-10 | 1 | 0 | 0 |
123 | Apr-10 | 1 | 100 | 10 |
123 | May-10 | 1 | 150 | 20 |
to the dataset below;
acct | date | paid_Jan2010 | paid_Feb2010 | paid_Mar2010 | paid_Apr2010 | paid_May2010 | cost_Jan2010 | cost_Feb2010 | cost_Mar2010 | cost_Apr2010 | cost_May2010 |
123 | Mar-10 | 0 | 0 | 0 | 100 | 150 | 0 | 0 | 0 | 10 | 20 |
whereby only the row where the seq becomes 1 the first time is taken in the resulting dataset but with all the paid and cost values in it as well.
Is this achievable? Understand that using proc transpose might not work and have been playing around by using arrays but have been unsuccessful so far.
Any advise or help is much appreciated!
Thank you! 🙂
I'm not going to question whether your desired result is or isn't a good idea, as we are all often asked to produce output in forms that we might not consider analytically ideal.
The answer to your question of "can it be done using arrays", the answer is yes. My solution uses an undocumented proc sql function, monotonic, thus SAS wouldn't support it always working as expected. In short, while it works, I wouldn't use the following code in a production environment. It would be safer to assign the format numbers in a data step .. but I leave that to you. I simply wrote some code to show that, yes, it can definitely be done:
data have; infile cards dlm=','; input acct date monyy7. seq paid cost; format date monyy7.; cards; 123,Jan-10,0,0,0 123,Feb-10,0,0,0 123,Mar-10,1,0,0 123,Apr-10,1,100,10 123,May-10,1,150,20 223,Jan-10,0,0,0 223,Feb-10,0,0,0 223,Mar-10,0,20,5 223,Apr-10,1,30,10 223,May-10,1,40,15 ; proc sql noprint; select catt('cost_',put(date,monyy7.)), catt('paid_',put(date,monyy7.)), catt(date,'=',monotonic()) into :costs separated by ' ', :paids separated by ' ', :for4mat separated by ' ' from (select distinct date from have) order by date ; quit; proc format; value dates &for4mat. ; run; data want (drop=_: seq paid cost); format acct 8.; format date monyy7.; set have (rename=(date=_date)); by acct seq; array costs(*) &costs.; array paids(*) &paids.; retain date costs paids; if first.acct then do; call missing(date); call missing(of costs(*)); call missing(of paids(*)); end; if seq eq 1 then date=coalesce(date,_date);; costs(put(_date,dates.))=cost; paids(put(_date,dates.))=paid; if last.acct then output; run;
Art, CEO, AnalystFinder.com
Advice: don't do it. The resulting data is not useful for reporting purposes, since it will be extremely wide when different ACCT values have different date ranges. And it isn't particularly useful for analysis purposes either. There are plenty of techniques for working with your data as is. If you post more about where you are headed, and what you would like to ultimately accomplish with this data, there are sure to be other approaches available.
Check the MERGE skill proposed by Me,Arthur.T and Matt.
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
infile cards truncover expandtabs;
input acct date : monyy7. seq paid cost;
format date monyy7.;
cards;
123 Jan-10 0 0 0
123 Feb-10 0 0 0
123 Mar-10 1 0 0
123 Apr-10 1 100 10
123 May-10 1 150 20
223 Jan-10 0 0 0
223 Feb-10 0 0 0
223 Mar-10 1 0 0
223 Apr-10 1 100 10
223 May-10 1 150 20
;
run;
proc sql;
create table temp as
select distinct date from have;
quit;
data _null_;
set temp end=last;
if _n_=1 then call execute('data want;merge ');
call execute(catt('have(rename=(paid=paid_',put(date,monyy7.),
' cost=cost_',put(date,monyy7.),') where=(date=',date,'))'));
if last then call execute(';by acct; drop date seq;run;');
run;
I second @Astounding answer. This format will be more cumbersome to deal with in the long run.
I'm not going to question whether your desired result is or isn't a good idea, as we are all often asked to produce output in forms that we might not consider analytically ideal.
The answer to your question of "can it be done using arrays", the answer is yes. My solution uses an undocumented proc sql function, monotonic, thus SAS wouldn't support it always working as expected. In short, while it works, I wouldn't use the following code in a production environment. It would be safer to assign the format numbers in a data step .. but I leave that to you. I simply wrote some code to show that, yes, it can definitely be done:
data have; infile cards dlm=','; input acct date monyy7. seq paid cost; format date monyy7.; cards; 123,Jan-10,0,0,0 123,Feb-10,0,0,0 123,Mar-10,1,0,0 123,Apr-10,1,100,10 123,May-10,1,150,20 223,Jan-10,0,0,0 223,Feb-10,0,0,0 223,Mar-10,0,20,5 223,Apr-10,1,30,10 223,May-10,1,40,15 ; proc sql noprint; select catt('cost_',put(date,monyy7.)), catt('paid_',put(date,monyy7.)), catt(date,'=',monotonic()) into :costs separated by ' ', :paids separated by ' ', :for4mat separated by ' ' from (select distinct date from have) order by date ; quit; proc format; value dates &for4mat. ; run; data want (drop=_: seq paid cost); format acct 8.; format date monyy7.; set have (rename=(date=_date)); by acct seq; array costs(*) &costs.; array paids(*) &paids.; retain date costs paids; if first.acct then do; call missing(date); call missing(of costs(*)); call missing(of paids(*)); end; if seq eq 1 then date=coalesce(date,_date);; costs(put(_date,dates.))=cost; paids(put(_date,dates.))=paid; if last.acct then output; run;
Art, CEO, AnalystFinder.com
Do you know the list of months in advance? If not it is much harder.
So first make two arrays to represent the two groups of variables that are used to store the COST and PAID values.
Then just use a new variable to store the first date where SEQ=1.
So assuming that all accounts have all months you could do this.
data have ;
input acct $ date :anydtdte. seq paid cost ;
format date monyy7. ;
cards;
123 Jan-10 0 0 0
123 Feb-10 0 0 0
123 Mar-10 1 0 0
123 Apr-10 1 100 10
123 May-10 1 150 20
;
proc print;
run;
data want ;
do i=1 by 1 until(last.acct);
set have ;
by acct ;
if seq=1 then first_date=coalesce(first_date,date);
array p PAID_JAN2010 PAID_FEB2010 PAID_MAR2010 PAID_APR2010 PAID_MAY2010 ;
array c COST_JAN2010 COST_FEB2010 COST_MAR2010 COST_APR2010 COST_MAY2010 ;
p(i) = paid;
c(i) = cost;
end;
format first_date monyy7.;
keep acct first_date paid_: cost_:;
run;
proc print;
run;
@Tom: We both took similar appoaches (I modified mine to include the way you handled date as your method was better). The difference between the approaches is that mine doesn't require one to know array size (or values) in advance, and not all dates have to exist for all records. If they all do exist then, yes, your approach is more parsimonious. However, if either of those conditions don't hold, then I'd prefer my approach.
Regardless, both provide methods that should help folks learn how do to such things.
Art, CEO, AnalystFinder.com
Dear all,
Thank you very much for all the responses. It has been very helpful!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.