I'm using arrays to transpose my dataset from long to wide because there are several variables that need to remain together in the ultimate output. I don't always know the maximum number of observations per ID, so I first use SQL to find that max to set the Array dimensions as described in this article.
So far so good: I'm getting one row per ID as I want.
The problem is, the variables are being output grouped by the order they're created (naturally):
Acc_1 | Acc_2, etc. ... | Date_1 | Date_2, etc., ... |
---|---|---|---|
There are 12 total numbered variables and the current dataset came to a max of 8, so output of the current dataset is going to be 96 columns wide.
The problem is that the client would prefer to see the output grouped by the sets (could think of it as interlaced), so from the example above the preferred order would be:
Acc_1 | Date_1 (rest of _1 variables) | Acc_2 | Date_2, etc. |
---|---|---|---|
Any suggestions on how this can be done either in the data step that gathers the information into the arrays or in a subsequent step? I've thought about a macro that would write out a PROC SQL with the select statement being written with the variables in the preferred order however many times found previously to set the array dimensions, but that just seems overly complicated to how it feels it should work.
Thanks
One is by usage of the retain statement prior to the set statement for arranging the order of the variables
data want;
retain var1 var2 ..........;
set have;
run;
Thanks,
Jag
Thanks for the suggestion, Jagadishkatam. I'm unsure how that will help, though, given each month this process runs has the potential to change the number of variables. This version might work this month, but if next month there are additional series, the program would need to be changed. Or am I missing a key point?
A macro variable is the key, by using the sashelp.vcolumn table to determine the order along with the retain statement.
This lead seems intriguing, though I'm not sure how to proceed in testing it.
Assuming your table is called HAVE and stored in the WORK library
You have two variables that start with ACC_i and DATE_i and you want them ordererd as ACC_1, DATE_1, ACC_2, DATE_2
If you have other variables that start with ACC or DATE that aren't part of the variable list you'll need to account for them some way.
Here's some skeleton (untested) code:
proc sql;
create table test as
select name, varnum, input(scan(name, 2, "_"), 2.) as var_order
from sashelp.vcolumn
where upper(libname)="WORK" and upper(memname)="HAVE"
and (upper(name) like 'ACC%'
or upper(name) like 'DATE%')
order by var_order, name ;
quit;
proc sql;
select trim(name) into :retain_list separated by " "
from test;
quit;
data want;
retain &retain_list;
set have;
run;
Ah, OK. Thanks, Reeza, that looks very promising. I want to thank you for your time. I don't think I'll have a chance to develop and test it today, but I'll let you know next week how things go.
At the top of the program you place this magic statement with
FORMAT var1 var2 var3 ..... var100;
Note there are NO length specifications for the Variables.
This keeps the above order of the variables in the OUTPUT DATA SET.
You may use INFORMAT instead too.
SQL can order those variables as you will via dictionary table . Post some sample data would be better.
Xia Keshan
SQL can order those variables as you will via dictionary table . Post some sample data would be better.
Xia Keshan
Hi Xia. Here is the code being used:
data RIS_BASE_MAMMO_ARRAY;
FORMAT MRN $15.
Ar_Accession1 - Ar_Accession&M_Accessions. $20.
Ar_ExamCode1 - Ar_ExamCode&M_Accessions. $25.
Ar_ReasonCode1 - Ar_ReasonCode&M_Accessions. $1.
Ar_ExamClass1 - Ar_ExamClass&M_Accessions. $15.
Ar_BIRADS1 - Ar_BIRADS&M_Accessions. $5.
Ar_SignedByRAD1 - Ar_SignedByRAD&M_Accessions. $75.
Ar_ScheduleDate1 - Ar_ScheduleDate&M_Accessions. DATETIME19.
Ar_CompletedDate1 - Ar_CompletedDate&M_Accessions. DATETIME19.
Ar_SignedDate1 - Ar_SignedDate&M_Accessions. DATETIME19.
Ar_BTSd1 - Ar_BTSd&M_Accessions. 8.
Ar_CTSm1 - Ar_CTSm&M_Accessions. 8.
Ar_CTSd1 - Ar_CTSd&M_Accessions. 8.
;
keep MRN
Ar_Accession1 - Ar_Accession&M_Accessions.
Ar_ExamCode1 - Ar_ExamCode&M_Accessions.
Ar_ReasonCode1 - Ar_ReasonCode&M_Accessions.
Ar_ExamClass1 - Ar_ExamClass&M_Accessions.
Ar_BIRADS1 - Ar_BIRADS&M_Accessions.
Ar_SignedByRAD1 - Ar_SignedByRAD&M_Accessions.
Ar_ScheduleDate1 - Ar_ScheduleDate&M_Accessions.
Ar_CompletedDate1 - Ar_CompletedDate&M_Accessions.
Ar_SignedDate1 - Ar_SignedDate&M_Accessions.
Ar_BTSd1 - Ar_BTSd&M_Accessions.
Ar_CTSm1 - Ar_CTSm&M_Accessions.
Ar_CTSd1 - Ar_CTSd&M_Accessions.
;
Array AR_MRN_Char{6, &M_Accessions.}
Ar_Accession1 - Ar_Accession&M_Accessions.
Ar_ExamCode1 - Ar_ExamCode&M_Accessions.
Ar_ReasonCode1 - Ar_ReasonCode&M_Accessions.
Ar_ExamClass1 - Ar_ExamClass&M_Accessions.
Ar_SignedByRAD1 - Ar_SignedByRAD&M_Accessions.
Ar_BIRADS1 - Ar_BIRADS&M_Accessions.
;
Array AR_MRN_Num{6, &M_Accessions.}
Ar_ScheduleDate1 - Ar_ScheduleDate&M_Accessions.
Ar_CompletedDate1 - Ar_CompletedDate&M_Accessions.
Ar_SignedDate1 - Ar_SignedDate&M_Accessions.
Ar_BTSd1 - Ar_BTSd&M_Accessions.
Ar_CTSm1 - Ar_CTSm&M_Accessions.
Ar_CTSd1 - Ar_CTSd&M_Accessions.
;
do obs = 1 by 1 until (last.MRN);
set RIS_BASE_MAMMO_DER_1;
by MRN;
Do j = 1 to 6;
Select (j);
when (1) do;
AR_MRN_Char{j, obs} = Accession;
end;
when (2) do;
AR_MRN_Char{j, obs} = ExamCode;
end;
when (3) do;
AR_MRN_Char{j, obs} = Reason;
end;
when (4) do;
AR_MRN_Char{j, obs} = MammoClass;
end;
when (5) do;
AR_MRN_Char{j, obs} = SignedBy;
end;
when (6) do;
AR_MRN_Char{j, obs} = AssessCode;
end;
otherwise;
end;
end;
Do j = 1 to 6;
Select (j);
when (1) do;
AR_MRN_Num{j, obs} = SchedDTTM;
end;
when (2) do;
AR_MRN_Num{j, obs} = CompletedDTTM;
end;
when (3) do;
AR_MRN_Num{j, obs} = SignedDTTM;
end;
when (4) do;
AR_MRN_Num{j, obs} = BTSd;
end;
when (5) do;
if not missing(SignedDTTM) then do;
AR_MRN_Num{j, obs} = round(((SignedDTTM-CompletedDTTM)/3600)); /*Completed to Signed in minutes for ACR comparison*/;
end;
end;
when (6) do;
if not missing(SignedDTTM) then do;
AR_MRN_Num{j, obs} = round(((SignedDTTM-CompletedDTTM)/86400)); /*Completed to Signed in days for charting */;
end;
end;
otherwise;
end;
end;
end;
run;
I think the obs Data Step variable could be used to somehow, but what that how would be is the question. The preferred layout would be
MRN | Accession1 | ExamCode1 | ReasonCode1 | ExamClass1 | BIRADS1 | SignedByRAD1 | ScheduledDate1 | CompletedDate1 | SignedDate1 | BTSd1 | CTMSm1 | CTSd1 | Accession2 (and all the _2 variable in the same order), etc. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Your suggestion of using SQL with the Dictionary table is intriguing.
Thanks for your contribution.
You know the names that you are going to create so just make then in the order you want. Then use them in a LENGTH statement before the ARRAY statements.
OK. You didn't offer me a sample data to test, so I make it on my own . Hope help you a little bit.
data have; array x{*} Ar_Accession1 - Ar_Accession12 ; array y{*} Ar_ExamCode1 - Ar_ExamCode12; array z{*} $ Ar_ReasonCode1 - Ar_ReasonCode12; do n=1 to 100; do i=1 to 12; x{i}=1;y{i}=2;z{i}='x'; end; output; end; drop n i; run; proc sql; select name into : list separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' order by input(compress(name,,'kd'),best8.),name; quit; data want; retain &list ; set have; run;
Xia Keshan
Hello,
There are few post related to this topic. Here is one example:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.