BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sasnewuser222
Calcite | Level 5

I have a unique requriment. I have 4 SAS datasets in sas library (p4,A,B,C). I have to read 1 observation from A,do some processing , store in temp dataset. Read 1 observation from B,do some process and store in temp dataset. Read 1 observation from C and do some processing and store in temp dataset. Then write all the temp dataset in JSON format. I am using multiple data steps and PROC JSON. It is taking lot of 'wall clock ' unusually. below is the sample code. Is there any way better to do this to avoid program load delay and voluntary wait.

           LIBNAME JSON 'JSON'; 
  %DO I = &fstobs %to &COUNT;
  data a1;                                   
  set JSON.policy4(FIRSTOBS=&I OBS=&I);
  call symput('policy_no',policyNumber);  
  run;         
 data aa;                                            
  set JSON.A;                                
  if policy > &policy_no then  DO; STOP; END;       
  else if policy = &policy_no then do; 
  N + 1;drop N;                      
  id = cats('&policy_id',_N_);
  output; end; 
  run;  
  data bb;                                             
  set JSON.b;                                  
  if policy > &policy_no then  DO; STOP; END;       
  else if policy = &policy_no then do;
  N + 1;drop N;                      
  id = cats('&policy_id',_N_);
  output; end; 
 run; 
 data cc;                                                 
 set json.c3;                                      
   if policyNumber > &policy_no then  DO; STOP; END;       
   else if policyNumber = &policy_no then do;
  N + 1;drop N;                      
 id = cats('&policy_id',_N_);
 output; end; 
 run; 
%macro output_json(table1,table2,table3,table4);     
 PROC JSON OUT= dat3                                   
       NOPRETTY NOSASTAGS  FMTDATETIME FMTNUMERIC; 
write open object;                                 
   write values 'kkkkkk';                          
   EXPORT &table1;                                 
      write values 'jjjjjj';              
      write open array;                            
      EXPORT &table2;                              
      write close;                                 
      write values 'xxxxx          ';              
      write open array;                            
      EXPORT &table3;                              
      write close;                                 
      write values 'yyyyyy';     
      write open array;                            
      EXPORT &table4;                              
      write close;                                 
                                          

SAS LOG

FIRST POLICY or FIRST OBSERVATION ( for 1000 obesrvations it is taking ~10 minutes). We have to do this for more than 1Bn.

NOTE: There were 9 observations read from the data set PHOENIX.POLCOMP3.
NOTE: The data set WORK.PCOMP1 has 8 observations and 25 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: There were 9 observations read from the data set PHOENIX.AGENT3.
NOTE: The data set WORK.AGENT has 8 observations and 36 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.02
EXCP count - 95
Task memory - 5296K (161K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: There were 30 observations read from the data set PHOENIX.CPYMENT3.
NOTE: The data set WORK.CPAYMNT has 29 observations and 19 variables.
NOTE: The DATA statement used the following resources:
CPU time - 00:00:00.00
Elapsed time - 00:00:00.03
EXCP count - 94
Task memory - 5291K (156K data, 5135K program)
Total memory - 29582K (7840K data, 21742K program)
Timestamp - 10/11/2020 10:51:06 PM
NOTE: The address space has used a maximum of 924K below the line and 37904K abo

NOTE: The PROCEDURE JSON used the following resources:
6 The SAS System

  CPU     time -         00:00:00.00                                        
  Elapsed time -         00:00:00.01                                        
  EXCP count   - 21                                                         
  Task  memory - 3800K (91K data, 3709K program)                            
  Total memory - 29648K (7840K data, 21808K program)                        
  Timestamp    - 10/11/2020 10:51:06 PM                                     

NOTE: The address space has used a maximum of 924K below the line and 37904K abo

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See this:

data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;

data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data final;
set
  ds2_expanded
  ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;

The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.

 

The advantage is that you do not need to process single policies separately.

If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.

The other datasets need to be sorted by policy to make BY group processing possible.

View solution in original post

8 REPLIES 8
Sasnewuser222
Calcite | Level 5
Thanks Sir. I am running this code in MF, though the code is just a sample . When I profiled in mainframe profiler CA-MATUNER, I am seeing 50% of time spent in Voluntary wait. I am not able to identify where the wait is spent. ( For 1000 observations it is taking ~3 Mins of wall clock time. Just for SAS datasets. I have total 10 SAS datasets. In which it is taking ~15 MIns of wall clock time )
Kurt_Bremser
Super User

If all this is to make counts for policy numbers, that can be done in SQL for all policy numbers in one step per dataset.

You can then join the results and subset with a WHERE, if needed.

Sasnewuser222
Calcite | Level 5
The &COUNT is to limit the loop. Basically I trying to concatenate observations from different SAS datasets based on policy, but I am adding some sequence number before concatenation. hence I am reading very SAS dataset in the loop, write them in temp datasets, and finally using PROC JSON writing as a single document. this multiple DATA step is taking lot of time
Kurt_Bremser
Super User

By "concatenate" do you mean that you stack data from different datasets, interleaving by policy_no, and create an indicator from where the observation is? If yes, this can be done in a single data step for all policy_no.

 

The code as such looks much too complicated, and I am not surprised that the setup of so many individual steps takes so long.

Next, I would not do such work directly from JSON libraries. Instead load each needed dataset as is from JSON to WORK first, and then proceed from there. Similarly, write the accumulated output once at the end of all processing.

 

If you provide examples for the incoming datasets and the wanted result from that (for several policy_no), I can help you with improved code.

Sasnewuser222
Calcite | Level 5
Thanks, Sir, Below is the sample input and desired output I want to achieve by the above code example.
Input
SAS DATSET1
POLCIY Uniue ID
A 10
B 20
C 30
D 40
E 50

SAS DATASET2
POLCICY Unique ID OTHER DETAILS
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX

SAS DATASET3
POLCICY Unique ID OTHER DETAILS3
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
Desired Output

Stage 1 😞 First i have to match policy and add UniueID and matching sequence to the SAS dataset2 and SAS dataset3)
The output will be.

POLCIY Uniue ID
A 10
B 20
C 30
D 40
E 50

SAS DATASET2
POLCICY Unique ID OTHER DETAILS
A 10_1 XXXXXX
A 10_2 XXXXXX
A 10_3 XXXXXX
B 20_1 XXXXXX

SAS DATASET3
POLCICY Unique ID OTHER DETAILS3
A 10_1 XXXXXX
A 10_2 XXXXXX
B 20_1 XXXXXX


Stage 2:

Create JSON document/container

PROC JSON
e.g

1) POLICY A 10[A 10_1 XXXXXX,A 10_2 XXXXXX,A 10_3 XXXXXX][A 10_1 XXXXXX,A 10_2 XXXXXX]
2) POLICY B 20[B 20_1 XXXXXX, B 20_1 XXXXXX]

Kurt_Bremser
Super User

See this:

data dataset1;
input POLICY $ Unique_ID $;
datalines;
A 10
B 20
C 30
D 40
E 50
;

data dataset2;
input POLICY $ Unique_ID $ OTHER_DETAILS $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data dataset3;
input POLICY $ Unique_ID $ OTHER_DETAILS3 $;
datalines;
A -blank- XXXXXX
A -blank- XXXXXX
B -blank- XXXXXX
;

data ds2_expanded;
set dataset2;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data ds3_expanded;
set dataset3;
by policy;
if _n_ = 1
then do;
  declare hash ds1 (dataset:"dataset1");
  ds1.definekey("policy");
  ds1.definedata("unique_id");
  ds1.definedone();
end;
if first.policy
then count = 1;
else count + 1;
if ds1.find() = 0 then unique_id = catx('_',unique_id,count);
drop count;
run;

data final;
set
  ds2_expanded
  ds3_expanded (rename=(other_details3=other_details))
;
by policy;
run;

The final dataset has all the content, you need to build the JSON "lines" by concatenating again with BY policy.

 

The advantage is that you do not need to process single policies separately.

If memory constraints on the mainframe make the use of the hash object impossible, these steps can be done with a data step MERGE; the advantage of the hash is that your dataset1 does not need to be sorted.

The other datasets need to be sorted by policy to make BY group processing possible.

Sasnewuser222
Calcite | Level 5
Thanks much KurtBremser. I will test this and reply. Thanks much for your time

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2115 views
  • 0 likes
  • 2 in conversation