DATA Step, Macro, Functions and more

Output Multiple datasets in Datastep with common ID

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Output Multiple datasets in Datastep with common ID

[ Edited ]

I have three large datasets and they have some common person ID and different variables. I want to output these three datasets to one JSON file without merging them first. 

Here are their structures like:

(08032017 UPDATED NOTE: In the real case, I have 10,000 unique person ID for three datasets, and about 1000 different variables for each dataset, one person ID may have different numbers of claims) 

 

Data1:

Person_ID   claim_ID  var1 var2 var3

1                 ab           12    23     34

1                 cc            34    54     32

2                 ee            56    32    55

Data2:

Person_ID   claim_ID  var1 var2 var3

1                 gh           11    22     23

2                 cd           34    67     43

3                 es            29    32    90

Data3:

Person_ID   claim_ID  var1 var2 var3

1                 yu           11    22     23

3                 uy           34    67     43

4                 er            29    32    90

 

I would like to print out all three datasets by the same person_id. Like following:

 

“person ID”: 1,

[{“claim_id”:”ab”,

 “var1”:”12”,

…},

{“claim_id”:”cc”,

 “var1”:”34”,

…},

{“claim_id”:”gh”,

 “var1”:”11”,

…},

{“claim_id”:”yu”,

 “var1”:”11”,

…}],

“person ID”:2,

[{“claim_id”:”ee”,

 “var1”:”56”,

…},

{“claim_id”:”cd”,

 “var1”:”34”,

…}],

 

I was thinking to pull out all distinct Person_id and corresponding claim_id to a dataset,

data4:

person_ ID  Claim_ID_1 Claim_ID_2  Claim_ID_3  Claim_ID_4

1                    ab                 cc               gh                            yu

2                    ee                                                           

3                    es                 uy                                      

4                    er                             

Then use macro, 

 

%macro data1(_claim);
set data2(where=(claim_id=&_claim));
put '"person id":"' person_id '","';
…
%mend;
 
data _null_;
file print PS = 32767;
set data4 end=lastrec;
 
array xclaim $ claim_ID_1 - claim_ID_4;  
do i= 1 to 4; 
if xclaim[i] ne ' ' then do ; 
CALL SYMPUT ('id', xclaim[i]);
%data1(&id);
%data2(&id);
%data3(&id);
end;
run;

 

It doesn't work. Has anyone a better idea? Thanks in advanced!


Accepted Solutions
Solution
Monday
SAS Employee
Posts: 31

Re: Output Multiple datasets in Datastep with common ID

I took the data you supplied and used DATA step views to get the data for each person's ID. To generate valid JSON, I used the JSON procedure (available in SAS 9.4). The output JSON snippet you posted did not appear to be valid. I was able to get the output pretty close to what you wanted. I hope this helps.

 

data work.ds1;
input person_id claim_id $ var1 var2 var3;
datalines;
1 ab 12 23 34
1 cc 34 45 32
2 ee 56 32 55
run;

data work.ds2;
input person_id claim_id $ var1 var2 var3;
datalines;
1 gh 11 22 23 
2 cd 34 67 43
3 es 29 32 90
;
run;

data work.ds3;
input person_id claim_id $ var1 var2 var3;
datalines;
1 yu 11 22 23
2 uy 34 67 43
3 er 29 32 90
;
run;

%macro setPersonData(personID);
data work.pd&personID / view=work.pd&personID;
set work.ds1 work.ds2 work.ds3;
where person_id = &personID;
run;
%mend setPersonData;

%setPersonData(1);
%setPersonData(2);
%setPersonData(3);

%macro addPersonData(person_id);
write value &person_id;
write open array;
export work.pd&person_id(drop=person_id);
write close;
%mend addPersonData;

%macro createJson(personCount);
proc json out="multiOutput.json" pretty nosastags;
write open object;
write values "person ID";
write open array;
%do personID=1 %to &personCount;
   %addPersonData(&personID);
%end;
write close;
write close;
run;
%mend createJson;

%createJson(3);
{
  "person ID": [
    1,
    [
      {
        "claim_id": "ab",
        "var1": 12,
        "var2": 23,
        "var3": 34
      },
      {
        "claim_id": "cc",
        "var1": 34,
        "var2": 45,
        "var3": 32
      },
      {
        "claim_id": "gh",
        "var1": 11,
        "var2": 22,
        "var3": 23
      },
      {
        "claim_id": "yu",
        "var1": 11,
        "var2": 22,
        "var3": 23
      }
    ],
    2,
    [
      {
        "claim_id": "ee",
        "var1": 56,
        "var2": 32,
        "var3": 55
      },
      {
        "claim_id": "cd",
        "var1": 34,
        "var2": 67,
        "var3": 43
      },
      {
        "claim_id": "uy",
        "var1": 34,
        "var2": 67,
        "var3": 43
      }
    ],
    3,
    [
      {
        "claim_id": "es",
        "var1": 29,
        "var2": 32,
        "var3": 90
      },
      {
        "claim_id": "er",
        "var1": 29,
        "var2": 32,
        "var3": 90
      }
    ]
  ]
}

 

View solution in original post


All Replies
Super User
Posts: 10,483

Re: Output Multiple datasets in Datastep with common ID

How large is "large" for your data sets?

And what is the logic about not combining the data first? Your requirement seems to call for combining them at output in a very much more complex manner. From you desired output it looks like:

 

data want;

    set data1 data2 data3;

run;

proc sort data=want;

    by person_id claim_id;

run;

 

Then processing by Person_id should not be very difficult.

Occasional Contributor
Posts: 6

Re: Output Multiple datasets in Datastep with common ID

Thanks for your quick response!

The sizes are like 1GB, 6GB, 7GB. In fact, I have more than three datasets to print out, some of them have more than 3000 columns which I think SAS could not handle. That's the issue why I could not combine them first.

Super User
Super User
Posts: 7,392

Re: Output Multiple datasets in Datastep with common ID

A dataset with 3000 is not useful in any sense, even 50 or so is stretching it.  Consider re-modelling your data, it will help with putting the data together, manipulating, and reporting the data out, for example:

Person_ID   claim_ID  var1 var2 var3

1                 ab           12    23     34

1                 cc            34    54     32

2                 ee            56    32    55

To

Person_ID   claim_ID  var_no result

1                 ab             1          12    

1                 ab             2          23    

1                 ab             3          34

 

.

..

 

As for your question, you should be able to do something like:

1) Open file first time, write data - but don't finish the end }

2) Write the subsequent datasets to the same file with the "mod" option on the file statement:

http://support.sas.com/kb/24/569.html

3) On the final dataset write and then write th closing brace

Something like;

data _null_;
  set a;
  file "abc.txt";
  put...;
run;

/* Repeat this for each other dataset */
data _null_;
  set b;
  file "abc.txt" mod;
  put...;
run;

/* for last one */
data...

 

Solution
Monday
SAS Employee
Posts: 31

Re: Output Multiple datasets in Datastep with common ID

I took the data you supplied and used DATA step views to get the data for each person's ID. To generate valid JSON, I used the JSON procedure (available in SAS 9.4). The output JSON snippet you posted did not appear to be valid. I was able to get the output pretty close to what you wanted. I hope this helps.

 

data work.ds1;
input person_id claim_id $ var1 var2 var3;
datalines;
1 ab 12 23 34
1 cc 34 45 32
2 ee 56 32 55
run;

data work.ds2;
input person_id claim_id $ var1 var2 var3;
datalines;
1 gh 11 22 23 
2 cd 34 67 43
3 es 29 32 90
;
run;

data work.ds3;
input person_id claim_id $ var1 var2 var3;
datalines;
1 yu 11 22 23
2 uy 34 67 43
3 er 29 32 90
;
run;

%macro setPersonData(personID);
data work.pd&personID / view=work.pd&personID;
set work.ds1 work.ds2 work.ds3;
where person_id = &personID;
run;
%mend setPersonData;

%setPersonData(1);
%setPersonData(2);
%setPersonData(3);

%macro addPersonData(person_id);
write value &person_id;
write open array;
export work.pd&person_id(drop=person_id);
write close;
%mend addPersonData;

%macro createJson(personCount);
proc json out="multiOutput.json" pretty nosastags;
write open object;
write values "person ID";
write open array;
%do personID=1 %to &personCount;
   %addPersonData(&personID);
%end;
write close;
write close;
run;
%mend createJson;

%createJson(3);
{
  "person ID": [
    1,
    [
      {
        "claim_id": "ab",
        "var1": 12,
        "var2": 23,
        "var3": 34
      },
      {
        "claim_id": "cc",
        "var1": 34,
        "var2": 45,
        "var3": 32
      },
      {
        "claim_id": "gh",
        "var1": 11,
        "var2": 22,
        "var3": 23
      },
      {
        "claim_id": "yu",
        "var1": 11,
        "var2": 22,
        "var3": 23
      }
    ],
    2,
    [
      {
        "claim_id": "ee",
        "var1": 56,
        "var2": 32,
        "var3": 55
      },
      {
        "claim_id": "cd",
        "var1": 34,
        "var2": 67,
        "var3": 43
      },
      {
        "claim_id": "uy",
        "var1": 34,
        "var2": 67,
        "var3": 43
      }
    ],
    3,
    [
      {
        "claim_id": "es",
        "var1": 29,
        "var2": 32,
        "var3": 90
      },
      {
        "claim_id": "er",
        "var1": 29,
        "var2": 32,
        "var3": 90
      }
    ]
  ]
}

 

Occasional Contributor
Posts: 6

Re: Output Multiple datasets in Datastep with common ID

Thanks Bill!! That really helps! Smiley Happy

However, I have more than 10,000 unique person id. I think it's impossible for me to create all the individual datasets....

 

SAS Employee
Posts: 31

Re: Output Multiple datasets in Datastep with common ID

If your concern is about the logistics of creating all the person specific data, there are programmatic ways to accurately simplify the task. If your concern is about the space considerations of creating 10,000 plus data sets, note that I use data set views. The views only store the information needed to query the data. It does not store the data. The actual data is not assembled until the data is requested (in this case, by the JSON procedure created by the createJson macro). 

Occasional Contributor
Posts: 6

Re: Output Multiple datasets in Datastep with common ID

Thank you for your help! I think your solution is the best way to output them!

It has been a while I am still working on it. However, I have issues when I output all the person data sets with do loop... I can't use proc json, I use data step instead.  I create a new question here: https://communities.sas.com/t5/Base-SAS-Programming/how-to-print-multiple-data-sets-with-do-loop/td-...

 

If you have a chance to look at it, I really appreciate it! 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 97 views
  • 0 likes
  • 4 in conversation