BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wy110
Fluorite | Level 6

I have three large datasets and they have some common person ID and different variables. I want to output these three datasets to one JSON file without merging them first. 

Here are their structures like:

(08032017 UPDATED NOTE: In the real case, I have 10,000 unique person ID for three datasets, and about 1000 different variables for each dataset, one person ID may have different numbers of claims) 

 

Data1:

Person_ID   claim_ID  var1 var2 var3

1                 ab           12    23     34

1                 cc            34    54     32

2                 ee            56    32    55

Data2:

Person_ID   claim_ID  var1 var2 var3

1                 gh           11    22     23

2                 cd           34    67     43

3                 es            29    32    90

Data3:

Person_ID   claim_ID  var1 var2 var3

1                 yu           11    22     23

3                 uy           34    67     43

4                 er            29    32    90

 

I would like to print out all three datasets by the same person_id. Like following:

 

“person ID”: 1,

[{“claim_id”:”ab”,

 “var1”:”12”,

…},

{“claim_id”:”cc”,

 “var1”:”34”,

…},

{“claim_id”:”gh”,

 “var1”:”11”,

…},

{“claim_id”:”yu”,

 “var1”:”11”,

…}],

“person ID”:2,

[{“claim_id”:”ee”,

 “var1”:”56”,

…},

{“claim_id”:”cd”,

 “var1”:”34”,

…}],

 

I was thinking to pull out all distinct Person_id and corresponding claim_id to a dataset,

data4:

person_ ID  Claim_ID_1 Claim_ID_2  Claim_ID_3  Claim_ID_4

1                    ab                 cc               gh                            yu

2                    ee                                                           

3                    es                 uy                                      

4                    er                             

Then use macro, 

 

%macro data1(_claim);
set data2(where=(claim_id=&_claim));
put '"person id":"' person_id '","';
…
%mend;
 
data _null_;
file print PS = 32767;
set data4 end=lastrec;
 
array xclaim $ claim_ID_1 - claim_ID_4;  
do i= 1 to 4; 
if xclaim[i] ne ' ' then do ; 
CALL SYMPUT ('id', xclaim[i]);
%data1(&id);
%data2(&id);
%data3(&id);
end;
run;

 

It doesn't work. Has anyone a better idea? Thanks in advanced!

1 ACCEPTED SOLUTION

Accepted Solutions
BillM_SAS
SAS Employee

I took the data you supplied and used DATA step views to get the data for each person's ID. To generate valid JSON, I used the JSON procedure (available in SAS 9.4). The output JSON snippet you posted did not appear to be valid. I was able to get the output pretty close to what you wanted. I hope this helps.

 

data work.ds1;
input person_id claim_id $ var1 var2 var3;
datalines;
1 ab 12 23 34
1 cc 34 45 32
2 ee 56 32 55
run;

data work.ds2;
input person_id claim_id $ var1 var2 var3;
datalines;
1 gh 11 22 23 
2 cd 34 67 43
3 es 29 32 90
;
run;

data work.ds3;
input person_id claim_id $ var1 var2 var3;
datalines;
1 yu 11 22 23
2 uy 34 67 43
3 er 29 32 90
;
run;

%macro setPersonData(personID);
data work.pd&personID / view=work.pd&personID;
set work.ds1 work.ds2 work.ds3;
where person_id = &personID;
run;
%mend setPersonData;

%setPersonData(1);
%setPersonData(2);
%setPersonData(3);

%macro addPersonData(person_id);
write value &person_id;
write open array;
export work.pd&person_id(drop=person_id);
write close;
%mend addPersonData;

%macro createJson(personCount);
proc json out="multiOutput.json" pretty nosastags;
write open object;
write values "person ID";
write open array;
%do personID=1 %to &personCount;
   %addPersonData(&personID);
%end;
write close;
write close;
run;
%mend createJson;

%createJson(3);
{
  "person ID": [
    1,
    [
      {
        "claim_id": "ab",
        "var1": 12,
        "var2": 23,
        "var3": 34
      },
      {
        "claim_id": "cc",
        "var1": 34,
        "var2": 45,
        "var3": 32
      },
      {
        "claim_id": "gh",
        "var1": 11,
        "var2": 22,
        "var3": 23
      },
      {
        "claim_id": "yu",
        "var1": 11,
        "var2": 22,
        "var3": 23
      }
    ],
    2,
    [
      {
        "claim_id": "ee",
        "var1": 56,
        "var2": 32,
        "var3": 55
      },
      {
        "claim_id": "cd",
        "var1": 34,
        "var2": 67,
        "var3": 43
      },
      {
        "claim_id": "uy",
        "var1": 34,
        "var2": 67,
        "var3": 43
      }
    ],
    3,
    [
      {
        "claim_id": "es",
        "var1": 29,
        "var2": 32,
        "var3": 90
      },
      {
        "claim_id": "er",
        "var1": 29,
        "var2": 32,
        "var3": 90
      }
    ]
  ]
}

 

View solution in original post

7 REPLIES 7
ballardw
Super User

How large is "large" for your data sets?

And what is the logic about not combining the data first? Your requirement seems to call for combining them at output in a very much more complex manner. From you desired output it looks like:

 

data want;

    set data1 data2 data3;

run;

proc sort data=want;

    by person_id claim_id;

run;

 

Then processing by Person_id should not be very difficult.

wy110
Fluorite | Level 6

Thanks for your quick response!

The sizes are like 1GB, 6GB, 7GB. In fact, I have more than three datasets to print out, some of them have more than 3000 columns which I think SAS could not handle. That's the issue why I could not combine them first.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A dataset with 3000 is not useful in any sense, even 50 or so is stretching it.  Consider re-modelling your data, it will help with putting the data together, manipulating, and reporting the data out, for example:

Person_ID   claim_ID  var1 var2 var3

1                 ab           12    23     34

1                 cc            34    54     32

2                 ee            56    32    55

To

Person_ID   claim_ID  var_no result

1                 ab             1          12    

1                 ab             2          23    

1                 ab             3          34

 

.

..

 

As for your question, you should be able to do something like:

1) Open file first time, write data - but don't finish the end }

2) Write the subsequent datasets to the same file with the "mod" option on the file statement:

http://support.sas.com/kb/24/569.html

3) On the final dataset write and then write th closing brace

Something like;

data _null_;
  set a;
  file "abc.txt";
  put...;
run;

/* Repeat this for each other dataset */
data _null_;
  set b;
  file "abc.txt" mod;
  put...;
run;

/* for last one */
data...

 

BillM_SAS
SAS Employee

I took the data you supplied and used DATA step views to get the data for each person's ID. To generate valid JSON, I used the JSON procedure (available in SAS 9.4). The output JSON snippet you posted did not appear to be valid. I was able to get the output pretty close to what you wanted. I hope this helps.

 

data work.ds1;
input person_id claim_id $ var1 var2 var3;
datalines;
1 ab 12 23 34
1 cc 34 45 32
2 ee 56 32 55
run;

data work.ds2;
input person_id claim_id $ var1 var2 var3;
datalines;
1 gh 11 22 23 
2 cd 34 67 43
3 es 29 32 90
;
run;

data work.ds3;
input person_id claim_id $ var1 var2 var3;
datalines;
1 yu 11 22 23
2 uy 34 67 43
3 er 29 32 90
;
run;

%macro setPersonData(personID);
data work.pd&personID / view=work.pd&personID;
set work.ds1 work.ds2 work.ds3;
where person_id = &personID;
run;
%mend setPersonData;

%setPersonData(1);
%setPersonData(2);
%setPersonData(3);

%macro addPersonData(person_id);
write value &person_id;
write open array;
export work.pd&person_id(drop=person_id);
write close;
%mend addPersonData;

%macro createJson(personCount);
proc json out="multiOutput.json" pretty nosastags;
write open object;
write values "person ID";
write open array;
%do personID=1 %to &personCount;
   %addPersonData(&personID);
%end;
write close;
write close;
run;
%mend createJson;

%createJson(3);
{
  "person ID": [
    1,
    [
      {
        "claim_id": "ab",
        "var1": 12,
        "var2": 23,
        "var3": 34
      },
      {
        "claim_id": "cc",
        "var1": 34,
        "var2": 45,
        "var3": 32
      },
      {
        "claim_id": "gh",
        "var1": 11,
        "var2": 22,
        "var3": 23
      },
      {
        "claim_id": "yu",
        "var1": 11,
        "var2": 22,
        "var3": 23
      }
    ],
    2,
    [
      {
        "claim_id": "ee",
        "var1": 56,
        "var2": 32,
        "var3": 55
      },
      {
        "claim_id": "cd",
        "var1": 34,
        "var2": 67,
        "var3": 43
      },
      {
        "claim_id": "uy",
        "var1": 34,
        "var2": 67,
        "var3": 43
      }
    ],
    3,
    [
      {
        "claim_id": "es",
        "var1": 29,
        "var2": 32,
        "var3": 90
      },
      {
        "claim_id": "er",
        "var1": 29,
        "var2": 32,
        "var3": 90
      }
    ]
  ]
}

 

wy110
Fluorite | Level 6

Thanks Bill!! That really helps! 🙂

However, I have more than 10,000 unique person id. I think it's impossible for me to create all the individual datasets....

 

BillM_SAS
SAS Employee

If your concern is about the logistics of creating all the person specific data, there are programmatic ways to accurately simplify the task. If your concern is about the space considerations of creating 10,000 plus data sets, note that I use data set views. The views only store the information needed to query the data. It does not store the data. The actual data is not assembled until the data is requested (in this case, by the JSON procedure created by the createJson macro). 

wy110
Fluorite | Level 6

Thank you for your help! I think your solution is the best way to output them!

It has been a while I am still working on it. However, I have issues when I output all the person data sets with do loop... I can't use proc json, I use data step instead.  I create a new question here: https://communities.sas.com/t5/Base-SAS-Programming/how-to-print-multiple-data-sets-with-do-loop/td-...

 

If you have a chance to look at it, I really appreciate it! 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1984 views
  • 0 likes
  • 4 in conversation