Hi,
I need assistance in transforming Table 1 below into Table 2 The last 2 characters of the column headers need to match. I have color coordinated them below. The last two characters of the program Id heading (A1 or A2) need to match the last two characters of the rate and the date columns. I am only providing a sample of my data as the dataset would be too large to attach.
Thank you!!
Have: Table 1
Program_ID.A1 | Program_ID.A2 | CCF_MIN.A1 | CCF_MAX.A1 | CCF_DATE.A1 | CCF_MIN.A2 | CCF_MAX.A2 | CCF_DATE.A2 |
12345 | 78907 | $1 | $2 | 10/23/2017 | $3 | $4 | 12/15/2019 |
23456 | 21278 | $1 | $2 | 10/23/2017 | $3 | $4 | 12/15/2019 |
34456 | 56785 | $1 | $2 | 10/23/2017 | $3 | $4 | 12/15/2019 |
Want: Table 2
Program_ID | CCF_MIN | CCF_MAX | CCF_Date |
12345 | $1 | $2 | 10/23/2017 |
23456 | $1 | $2 | 10/23/2017 |
34456 | $1 | $2 | 10/23/2017 |
78907 | $3 | $4 | 12/15/2019 |
21278 | $3 | $4 | 12/15/2019 |
56785 | $3 | $4 | 12/15/2019 |
Hi @lgonza22 et al, Sorry about my late entry to the party. Anyways, I get the feeling, once your wide data, gets very very and very wide with the potential of numerous groups i.e. your last 2 characters, you perhaps would need to resort to ways using indexed binary search(Hash/Indexes) rather than indexed storage(Arrays). The reason is the declaration can become unwieldy if you would choose to declare multiple arrays. To my mind, the trick here is, it's more of a problem that requires a search, store and retrieve approach, i.e. should the data structure be complex with combination of chars and numeric variables.
That being said, lazy me with only few sips of Starbucks coffee waiting for caffeine to kick in, am going to assume that all your variables are numeric to take a quick stab. I would request you to let us know the datatype of your variables if that assumption doesn't hold true, coz in that case the initial description of indexed binary search algorithm would be ideal. I would await for that clarification from you if you are keen with us and the thread. On the other hand, if the assumption does hold true i.e. all are numeric, the solution is traditional, easy and boring. 🙂 as follows-
data have;
input Program_IDA1 Program_IDA2 (CCF_MINA1 CCF_MAXA1)(:comma10.) CCF_DATEA1 :mmddyy10. (CCF_MINA2 CCF_MAXA2) (:comma10.) CCF_DATEA2 :mmddyy10.;
format CCF_DATE: mmddyy10. CCF_MIN: CCF_MAX: dollar10.;
cards;
12345 78907 $1 $2 10/23/2017 $3 $4 12/15/2019
23456 21278 $1 $2 10/23/2017 $3 $4 12/15/2019
34456 56785 $1 $2 10/23/2017 $3 $4 12/15/2019
;
data temp;
set have;
array n _numeric_;
do over n;
key=substr(vname(n),lengthn(vname(n))-1);
vn=substr(vname(n),1,lengthn(vname(n))-2);
_n=_n_;
value=n;
output;
end;
keep key vn _n value;
run;
proc sort data=temp;
by key _n;
run;
proc transpose data=temp out=want(drop=_: key);
by key _n;
var value;
id vn;
run;
/*Format the variables for display*/
proc datasets library=work nolist;
modify want;
format CCF_DATE mmddyy10. CCF_MIN CCF_MAX dollar20.;
quit;
proc print noobs;run;
Result-
Program_ID | CCF_MIN | CCF_MAX | CCF_DATE |
---|---|---|---|
12345 | $1 | $2 | 10/23/2017 |
23456 | $1 | $2 | 10/23/2017 |
34456 | $1 | $2 | 10/23/2017 |
78907 | $3 | $4 | 12/15/2019 |
21278 | $3 | $4 | 12/15/2019 |
56785 | $3 | $4 | 12/15/2019 |
data want;
set have;
array pgm(*) program_idA1 program_idA2; *list all here;
array _ccf_min(*) ccf_mina1-ccf_mina2; *list all here;
*Repeat for max, date;
do i=1 to dim(pgm);
ProgramID = pgm(i);
CCF_MIN = _ccf_min(i);
*remainder of items;
*explicit output to write it data set;
output;
end;
*list variables you want to drop here;
*drop program_IDa1-programIDa2 ;
run;
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
Here is a reference that illustrates how to refer to variables and datasets in a short cut list. These methods can be used to list the variables in the array statement.
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
In addition to @Reeza's excellent answer, you might be interested in this short tutorial about reshaping your data using DO loop and arrays.
will do thank you!
Hi @lgonza22 et al, Sorry about my late entry to the party. Anyways, I get the feeling, once your wide data, gets very very and very wide with the potential of numerous groups i.e. your last 2 characters, you perhaps would need to resort to ways using indexed binary search(Hash/Indexes) rather than indexed storage(Arrays). The reason is the declaration can become unwieldy if you would choose to declare multiple arrays. To my mind, the trick here is, it's more of a problem that requires a search, store and retrieve approach, i.e. should the data structure be complex with combination of chars and numeric variables.
That being said, lazy me with only few sips of Starbucks coffee waiting for caffeine to kick in, am going to assume that all your variables are numeric to take a quick stab. I would request you to let us know the datatype of your variables if that assumption doesn't hold true, coz in that case the initial description of indexed binary search algorithm would be ideal. I would await for that clarification from you if you are keen with us and the thread. On the other hand, if the assumption does hold true i.e. all are numeric, the solution is traditional, easy and boring. 🙂 as follows-
data have;
input Program_IDA1 Program_IDA2 (CCF_MINA1 CCF_MAXA1)(:comma10.) CCF_DATEA1 :mmddyy10. (CCF_MINA2 CCF_MAXA2) (:comma10.) CCF_DATEA2 :mmddyy10.;
format CCF_DATE: mmddyy10. CCF_MIN: CCF_MAX: dollar10.;
cards;
12345 78907 $1 $2 10/23/2017 $3 $4 12/15/2019
23456 21278 $1 $2 10/23/2017 $3 $4 12/15/2019
34456 56785 $1 $2 10/23/2017 $3 $4 12/15/2019
;
data temp;
set have;
array n _numeric_;
do over n;
key=substr(vname(n),lengthn(vname(n))-1);
vn=substr(vname(n),1,lengthn(vname(n))-2);
_n=_n_;
value=n;
output;
end;
keep key vn _n value;
run;
proc sort data=temp;
by key _n;
run;
proc transpose data=temp out=want(drop=_: key);
by key _n;
var value;
id vn;
run;
/*Format the variables for display*/
proc datasets library=work nolist;
modify want;
format CCF_DATE mmddyy10. CCF_MIN CCF_MAX dollar20.;
quit;
proc print noobs;run;
Result-
Program_ID | CCF_MIN | CCF_MAX | CCF_DATE |
---|---|---|---|
12345 | $1 | $2 | 10/23/2017 |
23456 | $1 | $2 | 10/23/2017 |
34456 | $1 | $2 | 10/23/2017 |
78907 | $3 | $4 | 12/15/2019 |
21278 | $3 | $4 | 12/15/2019 |
56785 | $3 | $4 | 12/15/2019 |
Thank you @novinosrin & @Reeza I really appreciate it! have a great day!
Another option, if you don't have enough solutions already is this macro:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.