Solved: SAS taking lot of time reading observations JSON

Sasnewuser222 · Posted 10-12-2020 07:12 AM

I have a unique requriment. I have 4 SAS datasets in sas library (p4,A,B,C). I have to read 1 observation from A,do some processing , store in temp dataset. Read 1 observation from B,do some process and store in temp dataset. Read 1 observation from C and do some processing and store in temp dataset. Then write all the temp dataset in JSON format. I am using multiple data steps and PROC JSON. It is taking lot of 'wall clock ' unusually. below is the sample code. Is there any way better to do this to avoid program load delay and voluntary wait.

           LIBNAME JSON 'JSON'; 
  %DO I = &fstobs %to &COUNT;
  data a1;                                   
  set JSON.policy4(FIRSTOBS=&I OBS=&I);
  call symput('policy_no',policyNumber);  
  run;         
 data aa;                                            
  set JSON.A;                                
  if policy > &policy_no then  DO; STOP; END;       
  else if policy = &policy_no then do; 
  N + 1;drop N;                      
  id = cats('&policy_id',_N_);
  output; end; 
  run;  
  data bb;                                             
  set JSON.b;                                  
  if policy > &policy_no then  DO; STOP; END;       
  else if policy = &policy_no then do;
  N + 1;drop N;                      
  id = cats('&policy_id',_N_);
  output; end; 
 run; 
 data cc;                                                 
 set json.c3;                                      
   if policyNumber > &policy_no then  DO; STOP; END;       
   else if policyNumber = &policy_no then do;
  N + 1;drop N;                      
 id = cats('&policy_id',_N_);
 output; end; 
 run; 
%macro output_json(table1,table2,table3,table4);     
 PROC JSON OUT= dat3                                   
       NOPRETTY NOSASTAGS  FMTDATETIME FMTNUMERIC; 
write open object;                                 
   write values 'kkkkkk';                          
   EXPORT &table1;                                 
      write values 'jjjjjj';              
      write open array;                            
      EXPORT &table2;                              
      write close;                                 
      write values 'xxxxx          ';              
      write open array;                            
      EXPORT &table3;                              
      write close;                                 
      write values 'yyyyyy';     
      write open array;                            
      EXPORT &table4;                              
      write close;

SAS LOG

FIRST POLICY or FIRST OBSERVATION ( for 1000 obesrvations it is taking ~10 minutes). We have to do this for more than 1Bn.

NOTE: There were 9 observations read from the data set PHOENIX.POLCOMP3.
NOTE: The data set WORK.PCOMP1 has 8 observations and 25 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: There were 9 observations read from the data set PHOENIX.AGENT3.
NOTE: The data set WORK.AGENT has 8 observations and 36 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.02
EXCP count - 95
Task memory - 5296K (161K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: There were 30 observations read from the data set PHOENIX.CPYMENT3.
NOTE: The data set WORK.CPAYMNT has 29 observations and 19 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: The PROCEDURE JSON used the following resources:
6 The SAS System

  CPU     time -         00:00:00.00                                        
  Elapsed time -         00:00:00.01                                        
  EXCP count   - 21                                                         
  Task  memory - 3800K (91K data, 3709K program)                            
  Total memory - 29648K (7840K data, 21808K program)                        
  Timestamp    - 10/11/2020 10:51:06 PM

NOTE: The address space has used a maximum of 924K below the line and 37904K abo

Kurt_Bremser · Posted 10-13-2020 08:11 AM

See this:

data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;

data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data final;
set
  ds2_expanded
  ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;

The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.

The advantage is that you do not need to process single policies separately.

If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.

The other datasets need to be sorted by policy to make BY group processing possible.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 10-12-2020 08:22 AM

The log you posted took less than a second to run through, the individual steps finish in a few hundredths of seconds. Your issue must lie somewhere else.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sasnewuser222 · Posted 10-12-2020 10:29 AM

Thanks Sir. I am running this code in MF, though the code is just a sample . When I profiled in mainframe profiler CA-MATUNER, I am seeing 50% of time spent in Voluntary wait. I am not able to identify where the wait is spent. ( For 1000 observations it is taking ~3 Mins of wall clock time. Just for SAS datasets. I have total 10 SAS datasets. In which it is taking ~15 MIns of wall clock time )

Kurt_Bremser · Posted 10-12-2020 08:27 AM

If all this is to make counts for policy numbers, that can be done in SQL for all policy numbers in one step per dataset.

You can then join the results and subset with a WHERE, if needed.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sasnewuser222 · Posted 10-12-2020 11:40 AM

The &COUNT is to limit the loop. Basically I trying to concatenate observations from different SAS datasets based on policy, but I am adding some sequence number before concatenation. hence I am reading very SAS dataset in the loop, write them in temp datasets, and finally using PROC JSON writing as a single document. this multiple DATA step is taking lot of time

Kurt_Bremser · Posted 10-13-2020 05:58 AM

By "concatenate" do you mean that you stack data from different datasets, interleaving by policy_no, and create an indicator from where the observation is? If yes, this can be done in a single data step for all policy_no.

The code as such looks much too complicated, and I am not surprised that the setup of so many individual steps takes so long.

Next, I would not do such work directly from JSON libraries. Instead load each needed dataset as is from JSON to WORK first, and then proceed from there. Similarly, write the accumulated output once at the end of all processing.

If you provide examples for the incoming datasets and the wanted result from that (for several policy_no), I can help you with improved code.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sasnewuser222 · Posted 10-13-2020 07:16 AM

Thanks, Sir, Below is the sample input and desired output I want to achieve by the above code example.
Input
SAS DATSET1
POLCIY Uniue ID
A 10
B 20
C 30
D 40
E 50

SAS DATASET2
POLCICY Unique ID OTHER DETAILS
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX

SAS DATASET3
POLCICY Unique ID OTHER DETAILS3
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
Desired Output

Stage 1 😞 First i have to match policy and add UniueID and matching sequence to the SAS dataset2 and SAS dataset3)
The output will be.

POLCIY Uniue ID
A 10
B 20
C 30
D 40
E 50

SAS DATASET2
POLCICY Unique ID OTHER DETAILS
A 10_1 XXXXXX
A 10_2 XXXXXX
A 10_3 XXXXXX
B 20_1 XXXXXX

SAS DATASET3
POLCICY Unique ID OTHER DETAILS3
A 10_1 XXXXXX
A 10_2 XXXXXX
B 20_1 XXXXXX

Stage 2:

Create JSON document/container

PROC JSON
e.g

1) POLICY A 10[A 10_1 XXXXXX,A 10_2 XXXXXX,A 10_3 XXXXXX][A 10_1 XXXXXX,A 10_2 XXXXXX]
2) POLICY B 20[B 20_1 XXXXXX, B 20_1 XXXXXX]

Kurt_Bremser · Posted 10-13-2020 08:11 AM

See this:

data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;

data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data final;
set
  ds2_expanded
  ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;

The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.

The advantage is that you do not need to process single policies separately.

If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.

The other datasets need to be sorted by policy to make BY group processing possible.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sasnewuser222 · Posted 10-13-2020 11:22 AM

Thanks much KurtBremser. I will test this and reply. Thanks much for your time

SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

Re: SAS taking lot of time reading observations JSON

SAS Innovate 2025: Call for Content

Classroom Training Available!