I have a unique requriment. I have 4 SAS datasets in sas library (p4,A,B,C). I have to read 1 observation from A,do some processing , store in temp dataset. Read 1 observation from B,do some process and store in temp dataset. Read 1 observation from C and do some processing and store in temp dataset. Then write all the temp dataset in JSON format. I am using multiple data steps and PROC JSON. It is taking lot of 'wall clock ' unusually. below is the sample code. Is there any way better to do this to avoid program load delay and voluntary wait.
LIBNAME JSON 'JSON';
%DO I = &fstobs %to &COUNT;
data a1;
set JSON.policy4(FIRSTOBS=&I OBS=&I);
call symput('policy_no',policyNumber);
run;
data aa;
set JSON.A;
if policy > &policy_no then DO; STOP; END;
else if policy = &policy_no then do;
N + 1;drop N;
id = cats('&policy_id',_N_);
output; end;
run;
data bb;
set JSON.b;
if policy > &policy_no then DO; STOP; END;
else if policy = &policy_no then do;
N + 1;drop N;
id = cats('&policy_id',_N_);
output; end;
run;
data cc;
set json.c3;
if policyNumber > &policy_no then DO; STOP; END;
else if policyNumber = &policy_no then do;
N + 1;drop N;
id = cats('&policy_id',_N_);
output; end;
run;
%macro output_json(table1,table2,table3,table4);
PROC JSON OUT= dat3
NOPRETTY NOSASTAGS FMTDATETIME FMTNUMERIC;
write open object;
write values 'kkkkkk';
EXPORT &table1;
write values 'jjjjjj';
write open array;
EXPORT &table2;
write close;
write values 'xxxxx ';
write open array;
EXPORT &table3;
write close;
write values 'yyyyyy';
write open array;
EXPORT &table4;
write close;
SAS LOG
FIRST POLICY or FIRST OBSERVATION ( for 1000 obesrvations it is taking ~10 minutes). We have to do this for more than 1Bn.
NOTE: There were 9 observations read from the data set PHOENIX.POLCOMP3.
NOTE: The data set WORK.PCOMP1 has 8 observations and 25 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo
NOTE: There were 9 observations read from the data set PHOENIX.AGENT3.
NOTE: The data set WORK.AGENT has 8 observations and 36 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.02
EXCP count - 95
Task memory - 5296K (161K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo
NOTE: There were 30 observations read from the data set PHOENIX.CPYMENT3.
NOTE: The data set WORK.CPAYMNT has 29 observations and 19 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo
NOTE: The PROCEDURE JSON used the following resources:
6 The SAS System
CPU time - 00:00:00.00
Elapsed time - 00:00:00.01
EXCP count - 21
Task memory - 3800K (91K data, 3709K program)
Total memory - 29648K (7840K data, 21808K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo
See this:
data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;
data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;
data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;
data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
declare hash ds1 (dataset:"dataset1");
ds1.definekey("policy");
ds1.definedata("unique_id");
ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;
data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
declare hash ds1 (dataset:"dataset1");
ds1.definekey("policy");
ds1.definedata("unique_id");
ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;
data final;
set
ds2_expanded
ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;
The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.
The advantage is that you do not need to process single policies separately.
If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.
The other datasets need to be sorted by policy to make BY group processing possible.
The log you posted took less than a second to run through, the individual steps finish in a few hundredths of seconds. Your issue must lie somewhere else.
If all this is to make counts for policy numbers, that can be done in SQL for all policy numbers in one step per dataset.
You can then join the results and subset with a WHERE, if needed.
By "concatenate" do you mean that you stack data from different datasets, interleaving by policy_no, and create an indicator from where the observation is? If yes, this can be done in a single data step for all policy_no.
The code as such looks much too complicated, and I am not surprised that the setup of so many individual steps takes so long.
Next, I would not do such work directly from JSON libraries. Instead load each needed dataset as is from JSON to WORK first, and then proceed from there. Similarly, write the accumulated output once at the end of all processing.
If you provide examples for the incoming datasets and the wanted result from that (for several policy_no), I can help you with improved code.
See this:
data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;
data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;
data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;
data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
declare hash ds1 (dataset:"dataset1");
ds1.definekey("policy");
ds1.definedata("unique_id");
ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;
data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
declare hash ds1 (dataset:"dataset1");
ds1.definekey("policy");
ds1.definedata("unique_id");
ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;
data final;
set
ds2_expanded
ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;
The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.
The advantage is that you do not need to process single policies separately.
If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.
The other datasets need to be sorted by policy to make BY group processing possible.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.