It took me a while to get this data together and now it seems the clients display preference is going to be a huge undertaking. I'm not sure if transpose will put this together easily or if there is another way to get this but here it goes, below is a small portion of a dataset, power_id is a unique identifier and it's sorted by power_id record_date.
Here is what I have:
POWER_ID | RECORD_DATE | pid_default_reason | cure_count | ACCT_DISCLOSURE_STATUS_FLG | PD_CD | DEL_DAYS | ACCT_CURRENT_CHARGE_OFF_AMT | pass_date |
0000196162 | 06/30/2013 | 3 | P | ** | 0 | 0.00 | ||
0000196162 | 07/31/2013 | 4 | P | ** | 0 | 0.00 | ||
0000196162 | 08/31/2013 | 5 | P | ** | 0 | 0.00 | ||
0000196162 | 09/30/2013 | 6 | P | ** | 0 | 0.00 | 09/30/2013 |
I've attached what I want because it didn't look very nice pasted. I had to use a csv because the excel format was too large. You'll notice there are different columns before and after September 2013. Any help will be appreciated.
Something like the following comes close:
data have;
infile cards dlm='09'x;
informat power_id $10.;
informat record_date mmddyy10.;
format record_date mmddyy10.;
informat pid_default_reason $1.;
informat cure_count 1.;
informat ACCT_DISCLOSURE_STATUS_FLG $1.;
informat PD_CD $2.;
informat DEL_DAYS 8.;
informat ACCT_CURRENT_CHARGE_OFF_AMT 8.;
informat pass_date mmddyy10.;
format pass_date mmddyy10.;
input POWER_ID RECORD_DATE default_reason cure_count
ACCT_DISCLOSURE_STATUS_FLG PD_CD DEL_DAYS
ACCT_CURRENT_CHARGE_OFF_AMT pass_date;
cards;
0000196162 06/30/2013 . 3 P ** 0 0.00
0000196162 07/31/2013 . 4 P ** 0 0.00
0000196162 08/31/2013 . 5 P ** 0 0.00
0000196162 09/30/2013 . 6 P ** 0 0.00 09/30/2013
;
data have;
retain ACCT_DISCLOSURE_STATUS_FLG PD_CD DEL_DAYS
ACCT_CURRENT_CHARGE_OFF_AMT
default_reason cure_count;
set have;
run;
data have (drop=_:);
format pass_date mmddyy10.;
do until (last.power_id);
set have (rename=(pass_date=_pass_date));
by power_id;
if first.power_id then call missing(pass_date);
if not missing(_pass_date) then pass_date=_pass_date;
end;
do until (last.power_id);
set have (rename=(pass_date=_pass_date));
by power_id;
output;
end;
run;
proc transpose data=have out=tall ;
by power_id record_date;
var ACCT_DISCLOSURE_STATUS_FLG--cure_count;
copy pass_date;
run;
proc transpose data=tall out=want (where=(not missing(_name_)));
by power_id;
id _name_ record_date;
var col1;
copy pass_date;
run;
data want;
set want (drop=_:);
run;
data need;
set want (keep=pd_cd: rename=(pd_cd06_30_2013=pd_cd_06_30_2013
pd_cd07_31_2013=pd_cd_07_31_2013
pd_cd08_31_2013=pd_cd_08_31_2013
pd_cd09_30_2013=pd_cd_09_30_2013 ));
run;
data want;
set want;
set need;
run;
Hi,
I would question the reasoning behind transposing the data. In your example you have four or five dates, what happens when you have 50, that would be 50 * 20 or so variables. You are then talking about working with thousands of variables. Whats the data to be used for? If its for reporting, does anyone seriously scroll the window across in Excel to see all those 1000's of columns, I somehow doubt it. If its for further analysis then the normalised structure would presumably be easier to work with. There are several ways of reporting the data in chunks, which may be preferable, for instance having a sheet for each quarter with the data normalised. Perhaps just blocking it out.
If however you have looked at the above and it is still required, then you could transpose each variable using proc transpose, but its a bit long winded. You could use arrays:
SAS Learning Module: Reshaping data long to wide using the data step
Which would shrink your code somewhat.
So its technically possible, but do you really want to go there?
The dates will only go from June to December of the same year, at the longest. I agree that this is not the best way to display the data but that is what is requested. I would like to use arrays but am not very familiar. I am updating the info with proc transpose but not getting it quite right, that's why I posted here.
Well, for the arrays, here is a small example. abcd all get transposed up, {1} references the month 06, {2} references 07. (am finished for the week now, so hope this helps.)
data have;
infile datalines dlm=",";
input POWER_ID $ RECORD_DATE $ A B $ C D;
datalines;
0000196162,06/30/2013,3,P **,0,0.00
0000196162,07/31/2013,4,P **,0,0.00
;
run;
data want (drop=a b c d);
set have;
by power_id;
retain _a1-_a2 _b1-_b2 _c1-_c2 _d1-_d2;
array _a{2} 8.;
array _b{2} $8.;
array _c{2} 8.;
array _d{2} 8.;
select(substr(record_date,1,2));
when("06") ind=1;
when("07") ind=2;
end;
_a{ind}=a;
_b{ind}=b;
_c{ind}=c;
_d{ind}=d;
if last.power_id then output;
run;
how about with the one ID variable per line and everything leading with date out to the right? I'm trying that with multiple transposes but still not quite getting there.
Something like the following comes close:
data have;
infile cards dlm='09'x;
informat power_id $10.;
informat record_date mmddyy10.;
format record_date mmddyy10.;
informat pid_default_reason $1.;
informat cure_count 1.;
informat ACCT_DISCLOSURE_STATUS_FLG $1.;
informat PD_CD $2.;
informat DEL_DAYS 8.;
informat ACCT_CURRENT_CHARGE_OFF_AMT 8.;
informat pass_date mmddyy10.;
format pass_date mmddyy10.;
input POWER_ID RECORD_DATE default_reason cure_count
ACCT_DISCLOSURE_STATUS_FLG PD_CD DEL_DAYS
ACCT_CURRENT_CHARGE_OFF_AMT pass_date;
cards;
0000196162 06/30/2013 . 3 P ** 0 0.00
0000196162 07/31/2013 . 4 P ** 0 0.00
0000196162 08/31/2013 . 5 P ** 0 0.00
0000196162 09/30/2013 . 6 P ** 0 0.00 09/30/2013
;
data have;
retain ACCT_DISCLOSURE_STATUS_FLG PD_CD DEL_DAYS
ACCT_CURRENT_CHARGE_OFF_AMT
default_reason cure_count;
set have;
run;
data have (drop=_:);
format pass_date mmddyy10.;
do until (last.power_id);
set have (rename=(pass_date=_pass_date));
by power_id;
if first.power_id then call missing(pass_date);
if not missing(_pass_date) then pass_date=_pass_date;
end;
do until (last.power_id);
set have (rename=(pass_date=_pass_date));
by power_id;
output;
end;
run;
proc transpose data=have out=tall ;
by power_id record_date;
var ACCT_DISCLOSURE_STATUS_FLG--cure_count;
copy pass_date;
run;
proc transpose data=tall out=want (where=(not missing(_name_)));
by power_id;
id _name_ record_date;
var col1;
copy pass_date;
run;
data want;
set want (drop=_:);
run;
data need;
set want (keep=pd_cd: rename=(pd_cd06_30_2013=pd_cd_06_30_2013
pd_cd07_31_2013=pd_cd_07_31_2013
pd_cd08_31_2013=pd_cd_08_31_2013
pd_cd09_30_2013=pd_cd_09_30_2013 ));
run;
data want;
set want;
set need;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.