proc sort data=prolease&perishort. nodupkey;
by operacion database;
run;
proc sort data=ext_new nodupkey dupout=check;
by operacion database;
run;
data prolease&perishort.;
modify prolease&perishort. ext_new;
by operacion database;
run;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
22
23 GOPTIONS ACCESSIBLE;
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
24
25 data prolease&perishort.;
26
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
26 ! modify prolease&perishort. ext_new;
27 by operacion database;
28 run;
NOTE: The DATA step has been abnormally terminated.
NOTE: There were 1 observations read from the data set WORK.PROLEASE1812.
NOTE: The data set WORK.PROLEASE1812 has been updated. There were 2492 observations rewritten, 0 observations added and 0
observations deleted.
NOTE: There were 2493 observations read from the data set WORK.EXT_NEW.
NOTE: DATA statement used (Total process time):
real time 1:08.18
cpu time 1:07.47
Is not working as expected. I expect it to produce the same result as MERGE statement without overwriting with missing values.
MERGE statement does work:
23 GOPTIONS ACCESSIBLE;
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
24 data prolease&perishort.;
25
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
25 ! merge prolease&perishort. ext_new;
26 by operacion database;
27 run;
NOTE: There were 347852 observations read from the data set WORK.PROLEASE1812.
NOTE: There were 11066 observations read from the data set WORK.EXT_NEW.
NOTE: The data set WORK.PROLEASE1812 has 347853 observations and 177 variables.
NOTE: DATA statement used (Total process time):
real time 2.91 seconds
cpu time 2.57 seconds
MODIFY without transaction dataset works:
23 GOPTIONS ACCESSIBLE;
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
24 data prolease&perishort.;
25
SYMBOLGEN: Macro variable PERISHORT resolves to 1812
25 ! modify prolease&perishort.;
26 extrest='test';
27 run;
NOTE: There were 347853 observations read from the data set WORK.PROLEASE1812.
NOTE: The data set WORK.PROLEASE1812 has been updated. There were 347853 observations rewritten, 0 observations added and 0
observations deleted.
NOTE: DATA statement used (Total process time):
real time 1.06 seconds
cpu time 1.06 seconds
Please be more specific in your requirements.. Was all you did to change the word 'merge' to 'modify'? Can you provide some example data that shows what WORK.PROLEASE1812 looks like?
after running:
data test;
set prolease1812(keep=operacion operorigen database);
run;
proc sort data=test nodup; by _ALL_;
proc sort data=test nodupkey; by operacion database;
data test;
modify test ext_new;
by operacion database;
run;
(SAS block code in this web doesn't seem to preserve my indenting).
my tables look like this:
proc contents data=test;
proc contents data=ext_new;
The CONTENTS Procedure
Data Set Name | WORK.TEST | Observations | 347852 |
Member Type | DATA | Variables | 3 |
Engine | V9 | Indexes | 0 |
Created | 01/11/2019 09:13:44 | Observation Length | 44 |
Last Modified | 01/11/2019 09:16:21 | Deleted Observations | 0 |
Protection |
| Compressed | NO |
Data Set Type |
| Sorted | YES |
Label |
|
|
|
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 |
|
|
Encoding | latin1 Western (ISO) |
|
|
Engine/Host Dependent Information | |
Data Set Page Size | 65536 |
Number of Data Set Pages | 235 |
First Data Page | 1 |
Max Obs per Page | 1484 |
Obs in First Data Page | 1435 |
Number of Data Set Repairs | 0 |
Filename | *** |
Release Created | 9.0401M4 |
Host Created | Linux |
Inode Number | 18612318 |
Access Permission | rw-rw-r-- |
Owner Name | franriv |
File Size | 15MB |
File Size (bytes) | 15466496 |
Alphabetic List of Variables and Attributes | |||
# | Variable | Type | Len |
2 | DataBase | Char | 4 |
3 | OperOrigen | Char | 20 |
1 | Operacion | Char | 20 |
Sort Information | |
Sortedby | Operacion DataBase |
Validated | YES |
Character Set | ASCII |
Sort Option | NODUPKEY |
The CONTENTS Procedure
Data Set Name | WORK.EXT_NEW | Observations | 11066 |
Member Type | DATA | Variables | 3 |
Engine | V9 | Indexes | 0 |
Created | 01/11/2019 09:09:13 | Observation Length | 44 |
Last Modified | 01/11/2019 09:09:13 | Deleted Observations | 0 |
Protection |
| Compressed | NO |
Data Set Type |
| Sorted | YES |
Label |
|
|
|
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 |
|
|
Encoding | latin1 Western (ISO) |
|
|
Engine/Host Dependent Information | |
Data Set Page Size | 65536 |
Number of Data Set Pages | 8 |
First Data Page | 1 |
Max Obs per Page | 1484 |
Obs in First Data Page | 1434 |
Number of Data Set Repairs | 0 |
Filename | *** |
Release Created | 9.0401M4 |
Host Created | Linux |
Inode Number | 18614142 |
Access Permission | rw-rw-r-- |
Owner Name | franriv |
File Size | 576KB |
File Size (bytes) | 589824 |
Alphabetic List of Variables and Attributes | ||||||
# | Variable | Type | Len | Format | Informat | Label |
3 | database | Char | 4 |
|
|
|
1 | operacion | Char | 20 | $11. | $11. | OperDest |
2 | operorigen | Char | 20 | $11. | $11. | OperOri |
Sort Information | |
Sortedby | operacion database |
Validated | YES |
Character Set | ASCII |
Sort Option | NODUPKEY |
This is just me testing to learn how to use the MODIFY statement; I am expecting the same output as a merge statement (provided operorigen is not null for any entry in the ext_new table), but faster execution.
merge statement does work
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
22
23 GOPTIONS ACCESSIBLE;
24 data test2;
25 merge test ext_new;
26 by operacion database;
27 run;
NOTE: There were 347852 observations read from the data set WORK.TEST.
NOTE: There were 11066 observations read from the data set WORK.EXT_NEW.
NOTE: The data set WORK.TEST2 has 347853 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.26 seconds
but modify statement won't do it;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
22
23 GOPTIONS ACCESSIBLE;
24
25 data test;
26 modify test ext_new;
27 by operacion database;
28 run;
NOTE: The DATA step has been abnormally terminated.
NOTE: There were 1 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST has been updated. There were 6563 observations rewritten, 0 observations added and 0 observations
deleted.
NOTE: There were 6564 observations read from the data set WORK.EXT_NEW.
NOTE: DATA statement used (Total process time):
real time 2:32.11
cpu time 2:30.86
> SAS block code in this web doesn't seem to preserve my indenting
That's the main reason to use spaces and not tabs for indentation: the code layout is safe.
There's a setting in EG for this purpose too.
1. Results
Here is an example:
data T1;
A=1; B=1; output;
A=2; B=2; output;
A=3; B=3; output;
run;
data T2;
A=1; B=0; output;
A=2; B=.; output;
run;
data T3;
merge T1 T2;
by A;
run;
data T1;
modify T1 T2;
by A;
run;
and the result:
A | B |
---|---|
1 | 0 |
2 | . |
3 | 3 |
A | B |
---|---|
1 | 0 |
2 | 2 |
3 | 3 |
What is surprising?
2. Speed
MODIFY is much slower than MERGE.
The reason is that instead of bulk-building a new table, each record is painstakingly updated one at a time.
For example:
data T1 T2;
do A=1 to 1e4;
B=A;
output;
end;
run;
data T3;
merge T1 T2;
by A;
run;
data T1;
modify T1 T2;
by A;
run;
34 data T3;
35 merge T1 T2;
36 by A;
37 run;
NOTE: There were 10000 observations read from the data set WORK.T1.
NOTE: There were 10000 observations read from the data set WORK.T2.
NOTE: The data set WORK.T3 has 10000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 1098.56k
OS Memory 17760.00k
Timestamp 01/15/2019 11:44:54 AM
Step Count 32 Switch Count 42
39 data T1;
40 modify T1 T2;
41 by A;
42 run;
NOTE: There were 1 observations read from the data set WORK.T1.
NOTE: The data set WORK.T1 has been updated. There were 10000 observations rewritten, 0 observations added and 0 observations
deleted.
NOTE: There were 10000 observations read from the data set WORK.T2.
NOTE: DATA statement used (Total process time):
real time 20.06 seconds
user cpu time 14.54 seconds
system cpu time 2.84 seconds
memory 1009.00k
OS Memory 17500.00k
Timestamp 01/15/2019 11:45:14 AM
Step Count 33 Switch Count 87
Here is an interesting link if the matter of table-join performance interests you:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.