Hello all,
I have a 2 part question.
The first part is that I want my testdata to be split into different datasets based on the order field. I can do this statically by writing something like this:
data testdata;
infile datalines dsd truncover;
input
ordernum:$3.
order:1.
code:$3.
;
datalines;
111, 1, qwe
111, 2, gfd
111, 3, sdf
222, 1, erw
222, 2, fgh
333, 1, saa
333, 2, ath
333, 3, gdw
444, 1, etg
444, 2, hef
444, 3, sfy
444, 4, hrl
555, 1, mmj
555, 2, bhg
555, 3, dgy
555, 4, ggd
555, 5, dss
;;;
data ordernum1 ordernum2 ordernum3 ordernum4 ordernum5;
set testdata;
if Order = 1 then output ordernum1;
if Order = 2 then output ordernum2;
if Order = 3 then output ordernum3;
if Order = 4 then output ordernum4;
if Order = 5 then output ordernum5;
run;
However, with this approach I can't be assured that there will always only be a total of 5 orders; there could be any number.
I'm having trouble getting a macro working and was looking for some additional eyes. This may not even be the best way to do this but I appreciate any input.
proc sql noprint;
select max(Order) into : order_num
from testdata;
quit;
%put Max Count = &order_num;
options mlogic mprint;
%macro split;
data %do i = 1 %to &order_num;
ordernum&i %end;
;
set test_Data;
select(Order);
%do i = 1 %to &order_num;
when Order = &i then output ordernum&i;
%end;
otherwise;
end;
run;
%mend split;
%split;
And lastly, I have a master table of ordernums I want to join these separated datasets back to. How do I approach writing the join logic dynamically? For example, I could do this manually if I knwo the number of tables the testdata will be split into by writing something like this:
proc sql;
create table New_Table as
select
a.*,
b.code as code1,
c.code as code2,
d.code as code3,
e.code as code4,
f.code as code5
from master a
left join ordernum1 b on a.ordernum = b.ordernum
left join ordernum2 c on a.ordernum = c.ordernum
left join ordernum3 d on a.ordernum = d.ordernum
left join ordernum4 e on a.ordernum = e.ordernum
left join ordernum5 f on a.ordernum = f.ordernum
;
quit;
However, if I don't know how many datasets will be created how can I include this join logic into the macro?
As always, appreciate any assistance. Thanks!
I'd worry about whether the ordernum was always sequential, so I'd do the following for the splitting apart. The remerging is a bit more macro work than I have time for.
data testdata;
infile datalines dsd truncover;
input
ordernum:$3.
order:1.
code:$3.
;
datalines;
111, 1, qwe
111, 2, gfd
111, 3, sdf
222, 1, erw
222, 2, fgh
333, 1, saa
333, 2, ath
333, 3, gdw
444, 1, etg
444, 2, hef
444, 3, sfy
444, 4, hrl
555, 1, mmj
555, 2, bhg
555, 3, dgy
555, 4, ggd
555, 5, dss
;;;
%macro orderSplit;
proc sql;
SELECT DISTINCT orderNum INTO :ord1-:ord999
FROM testData;
quit;
%let cnt = &sqlObs;
%do i = 1 %to &cnt;
data order&i;
set testData;
where orderNum = "&&ord&i";
run;
%end;
%mend orderSplit;
%orderSplit;
There is a little massaging of the data prior to this processing. The ordernum may not always be sequential. The order field is manufactored. I made up the testdata dataset as a small example. I create the order field using this code:
data testdata;
set testdata;
by ordernum notsorted;
if first.ordernum then Order = 0;
Order + 1;
run;
I think I understand the logic behind how you'd split apart the dataset.
proc sql;
SELECT DISTINCT orderNum INTO :ord1-:ord999
FROM testData;
quit;
However my original dataset has hundreds of thousands of records and I dont think that approach would be the best one which is why I was opting for seperating it by the order field I create.
May I ask what you are doing that requires splitting the data apart especially since you are then merging back? It may be something that can be handled with BY group processing.
Does your master ordernumr table have one record per ordernum or multiple? A small example set of the ordernumber and some of the "split" data and what the result after the combination should look like may be helpful.
The master table has a distinct ordernum per record. There are many "testdata" datasets with multiple records per ordernum. My goal is to join them to the master table to maintain that distinct ordernum per record and essentially flatten out the data by creating additional columns for each ordernum.
As an example, creating table Test by joining the master table with the testdata table from above results in a data set with 17 records istead of 5 which is not what I'm aiming to do.
data master;
infile datalines dsd truncover;
input
ordernum:$3.
;
datalines;
111
222
333
444
555
;
proc sql;
create table test as
select
a.*,
b.code
from master a
left join testdata b on a.ordernum = b.ordernum;
quit;
@Ody wrote:
The master table has a distinct ordernum per record. There are many "testdata" datasets with multiple records per ordernum. My goal is to join them to the master table to maintain that distinct ordernum per record and essentially flatten out the data by creating additional columns for each ordernum.
As an example, creating table Test by joining the master table with the testdata table from above results in a data set with 17 records istead of 5 which is not what I'm aiming to do.
The part I highlight above in bold text is very often a less than ideal approach. The main exception is when it is the very last step of a report for human consumption. The process you are describing creates a data structure that can be extremely hard to work with as the numbers, and hence names, of variables changes each time this process is followed. You might want to google "normalize data" for details.
May I answer your part2 first, as you may not need part1 to get part2.
This is your provided data file (Thank you for the ready-to-go input data):
data testdata;
infile datalines dsd truncover;
input
ordernum:$3.
order:1.
code:$3.
;
datalines;
111, 1, qwe
111, 2, gfd
111, 3, sdf
222, 1, erw
222, 2, fgh
333, 1, saa
333, 2, ath
333, 3, gdw
444, 1, etg
444, 2, hef
444, 3, sfy
444, 4, hrl
555, 1, mmj
555, 2, bhg
555, 3, dgy
555, 4, ggd
555, 5, dss
;;;
Now this to answer the your part2:
/*transpose to get all needed elements*/
PROC SORT
DATA=WORK.TESTDATA(KEEP=code order ordernum)
OUT=WORK.Sorted
;
BY ordernum;
RUN;
PROC TRANSPOSE DATA=WORK.Sorted
OUT=WORK.TRNS(drop=_name_)
PREFIX=code
;
BY ordernum;
ID order;
VAR code;
RUN;
QUIT;
/*merge ONE table*/
proc sql;
create table New_Table as
select
*
from master a
left join WORK.TRNS b on a.ordernum = b.ordernum
;
quit;
If for some reason splitting is still needed, for part1, here is one dynamic non-macro splitting approach:
/*Hash split*/
PROC SORT
DATA=WORK.TESTDATA(KEEP=code order ordernum)
OUT=WORK.Hash_sorted
;
BY order;
RUN;
data _null_;
if _n_=1 then
do;
/*if 0 then set hash_sorted;*/
declare hash h(dataset:'hash_sorted', multidata:'y');
h.definekey('order');
h.definedata(all:
'y');
h.definedone();
end;
rc=h.clear();
do until (last.order);
set hash_sorted;
by order;
rc=h.add();
end;
rc=h.output(dataset:cats('ordernum',order));
run;
Yes you can read all the data once via sql just to get the names of the datasets to be created. And then read a second time to output those data sets.
However, it appears that testdata is sorted by ordernum. If master (one record per ordernum, right?) is also sorted (or just indexed) by ordernum, then you can do it all in a single pass, by combining a merge statement with a hash table to dynamically output appropriately named data sets.
The hash table H below holds all the merged data for a given ordernum (see the h.add method). The program, upon encountering the last record for the ordernum, outputs the data to a correspondingly named data set (h.output method), and then clears the hash table to receive data for the next ordernum (h.clear):
/* Make an empty dataset with all the vars needed for the hash table*/
data dummy;
if 0 then set testdata master;
n=_n_;
stop;
run;
data _null_;
merge testdata master;
by ordernum;
n=_n_;
if _n_=1 then do;
declare hash h (dataset:'dummy',ordered:'a');
h.definekey('n');
h.definedata(all:'Y');
h.definedone();
end;
rc=h.add();
if last.ordernum then do;
h.output(dataset:cats('ordernum',ordernum,'(drop=n)'));
h.clear();
end;
run;
Notes:
After working at my problem a little more I figured out my issue was the syntax in the select part of the data step. For anyone interested here is the updated code:
proc sql noprint;
select max(Order) into : order_num
from testdata;
quit;
%put Max Count = &order_num;
options mlogic mprint;
%macro split;
data %do i = 1 %to &order_num;
ordernum&i %end;
;
set testdata;
select(Order);
%do i = 1 %to &order_num;
when ("&i") output ordernum&i;
%end;
otherwise;
end;
run;
%mend split;
%split;
Now, I need to figure out how to join these data sets back to my master table dynamically. If anyone has any suggestions or examples I'd be curious to see. Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.