BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lmtamina
Obsidian | Level 7

Good afternoon,

 

I am trying to split a table with 800 million records and 48 variables into 4 tables so we can reduce the size and send compressed via FTP.  I am currently doing the split individually and creating the 4 files by the codes below. Is there an easy way to create the 4 files by just reading from the MASTERFILE once . Or,is there a more efficient code to do this as we need to break up several big tables . I am a basic user of SAS and appreciate any assistance.

Thank you as always!

 

% let header_1 = variable1......variable12

% let header_2 = variable13.....variable24

% let header_3=variable25......variable36

% let header_4 = variable37....variable48

 

data step FILE1;

set MASTERFILE(keep=&header_1);

run;

data step FILE2;

set MASTERFILE(keep=&header_2);

run;

data step FILE3;

set MASTERFILE(keep=&header_3);

run;

data step FILE4;

set MASTERFILE(keep=&header_4);

run;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @lmtamina  Why not keep it simple by-

 

/*This reads only once*/

data FILE1 (keep=&header_1)
FILE2(keep=&header_2)
FILE3(keep=&header_3)
FILE4(keep=&header_4);
set MASTERFILE;
run;

View solution in original post

16 REPLIES 16
Reeza
Super User
It's not as common to split it out by variables, it's usually more common to split it by the records, ie 50 million in each data set type of idea. There's a small chance of things not merging back correctly whereas when split by records, appends are less likely to cause errors.

Is it a requirement to split it by variables instead?

https://gist.github.com/statgeek/abc3c6ce1dbeedb84fe7f11da0603cda

Blog post with several other methods here - but still row splitting, not column wise.
https://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/


lmtamina
Obsidian | Level 7
We are splitting by column as I understand so we do not have to run the queries to produce the master table as it takes a long time to run. We are adding a unique variable to all the 4 files so when our vendor receive them they can tie the records together using this variable. Does that make sense?
ballardw
Super User

@lmtamina wrote:
We are splitting by column as I understand so we do not have to run the queries to produce the master table as it takes a long time to run. We are adding a unique variable to all the 4 files so when our vendor receive them they can tie the records together using this variable. Does that make sense?

Sounds scary to me actually. I also am having a hard time understanding how this is supposed to save time.

Is your "unique variable"  a unique row identifier from the source for each?

 

 

lmtamina
Obsidian | Level 7
Yes, it is a unique identifier to tie the split tables together when our vendor receive the files
Reeza
Super User

It is not an issue of the 6 items. You really need to post the actual code and log.

I would check your macro variables and see if there's something there causing the error/issue or if it works with just the 6th data set.

data demo1 (keep = make model origin) 
demo2 (keep = mpg_highway mpg_city) 
demo3 (keep = Cylinders Type Weight Wheelbase Drivetrain EngineSize HorsePower) 
demo4 (keep = Invoice MSRP) 
demo5 (keep = Length Weight) demo6 (keep = Cylinders DriveTrain);
set sashelp.cars;
run;
Reeza
Super User
And all of this is another reason to split files by rows rather than columns...
novinosrin
Tourmaline | Level 20

@lmtamina  Can you "guarantee"  LOTSSSSS of Memory(RAM). Sorry not meaning to shout with UPCASE, but I am really asking. 

 

Also, what are the datatypes. Are those a mix of char/num. Is there a pattern in the arrangement?

novinosrin
Tourmaline | Level 20

Hi @lmtamina  Why not keep it simple by-

 

/*This reads only once*/

data FILE1 (keep=&header_1)
FILE2(keep=&header_2)
FILE3(keep=&header_3)
FILE4(keep=&header_4);
set MASTERFILE;
run;
lmtamina
Obsidian | Level 7
I tried the solution below but getting the following error. Could you kindly help.

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_.

data FILE1 (keep=&header_1)
FILE2(keep=&header_2)
FILE3(keep=&header_3)
FILE4(keep=&header_4);
set MASTERFILE;
run;

but I am
Reeza
Super User
Show the full log please.
lmtamina
Obsidian | Level 7
Below is the log. I am tryin gto run this one against another table we are splitting and for this one we have to split 6 files as the we have more variables. It does not error out if no of dataset is 5, greater than 5 I get this error. It appears there is a maximum number of table is the DATA step:


NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26 data FILE1 (keep=&file_1)
27 FILE2(keep=&file_2)
28 FILE3(keep=&file_3)
29 FILE4(keep=&file_4)
30 FILE5(keep=&file_5)
31 FILE6(keep=&file_6)
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_.

ERROR 200-322: The symbol is not recognized and will be ignored.

32 ;
33 set datacagb.mxa1861_Clarify_med_claims_2018;
34 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FILE1 may be incomplete. When this step was stopped there were 0 observations and 13 variables.
WARNING: Data set WORK.FILE1 was not replaced because this step was stopped.
WARNING: The data set WORK.FILE2 may be incomplete. When this step was stopped there were 0 observations and 9 variables.
WARNING: Data set WORK.FILE2 was not replaced because this step was stopped.
WARNING: The data set WORK.FILE3 may be incomplete. When this step was stopped there were 0 observations and 6 variables.
WARNING: Data set WORK.FILE3 was not replaced because this step was stopped.
WARNING: The data set WORK.FILE4 may be incomplete. When this step was stopped there were 0 observations and 6 variables.
WARNING: Data set WORK.FILE4 was not replaced because this step was stopped.
WARNING: The data set WORK.FILE5 may be incomplete. When this step was stopped there were 0 observations and 6 variables.
WARNING: Data set WORK.FILE5 was not replaced because this step was stopped.
WARNING: The data set WORK.FILE6 may be incomplete. When this step was stopped there were 0 observations and 9 variables.
WARNING: Data set WORK.FILE6 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
2 The SAS System 15:30 Thursday, October 15, 2020

user cpu time 0.01 seconds
system cpu time 0.02 seconds
memory 4823.65k
OS Memory 33496.00k
Timestamp 10/16/2020 06:02:46 PM
Step Count 53 Switch Count 94
Page Faults 0
Page Reclaims 345
Page Swaps 0
Voluntary Context Switches 426
Involuntary Context Switches 199
Block Input Operations 0
Block Output Operations 0
lmtamina
Obsidian | Level 7
I apologize. I posted the log before I read your post.I only read Reeza's post and missed yours. Anyway, saw your comment too to include the macro definition and that's when I found out there is an error in the last macro i defined, i had an extra ")" . So, now this solution is working! Thank you all for your assistance and expertise!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2365 views
  • 6 likes
  • 6 in conversation