08-14-2015 11:52 AM
I am working in a DYL280 to SAS conversion project. I am facing few issues to complete this task. Please help me to overcome this. Thanks much in advance.
First lookup to select NAME and loc:
Need to do a lookup between file1 and file2, whereas name was the key field. IF found between two files then update file1 by moving TLOC(in file2) TO ZIPCODE(in file1).
If not found then list 'NAME NOT FOUND' NAME in sas log(Note: value in the name field for that unmatched record should be written in the saslog as well) and reject that record .
And drop all unmatched records and update file1 with only matched records.
BINSEARCH FILE2 NAME
IFFOUND EQ 'Y' NEXT
ELSE LIST 'NAME NOT FOUND' NAME
GOTO NEXTRECORD ENDIF
MOVE TLOC TO ZIPCODE
Second lookup between file1 and file3, where as DEPT was the key field. IF found between two files AND TAREA1 NOT EQUAL TO 'AYA' then update file1 by moving TAREA2(in file2) TO TAREA(in file1) and AREADESC(in file2) TO LDESC(in file1).
If not found and TAREA1 EQ 'AYA' then list 'DEPT NOT FOUND IN FILE3' DEPT in sas log.And drop all unmatched records and update file1 with only matched records.
****REJECT ANY DEPT CCS, GET CC DECS****
BINSEARCH FILE3 DEPT
IFFOUND EQ 'Y' NEXT
ELSE LIST 'DEPT NOT FOUND IN FILE3' DEPT
GOTO NEXTRECORD ENDIF
IF TAREA1 EQ 'AYA' GOTO READMAST ENDIF
MOVE TAREA2 TO TAREA
MOVE AREADESC TO LDESC
SAS code we tried for first logic: But unable to drop unmatched records and update more than one variable at a time. And unable to list down the nonmatched field value in SAS log.
update file1 as A
set zipcode = (select tloc from file2 as B
where b.Tname = A.Name)
Input file layout and sample records:
@01 Name $12.
@14 zipcode 05.
@20 Area1 $03.
@23 dept 04.
@28 ldper 02.
@30 ldesc 30.;
JohnAbraham 80000 AY 1900 09
Madisson 75200 AY 1825 01
Murphy 95210 AY 1952 06
Donaldmathew 65214 NY 6854 18
Eric 85102 NA 7889 24
@01 TName $12.
@13 Tloc 05.
@18 tarea1 $03.
@21 tdept1 04.;
Mathews55589 LA 1980 10
Jonny 75200 LA 1825 01
David 87510 HT 1952 06
Edrick 65214 LO 6854 18
Monaay 74502 NA 7889 24
@01 Tloc1 05.
@07 tarea2 $03.
@10 tdept2 04.
@14 areadesc 30.;
55589 LA 1980 10 SOUTHEREN CALIFORNIA
75200 LA 1825 01
87510 HT 1952 06 EASTERAN BAY
65214 LO 6854 18 NEWYORK
74502 NA 7889 24 WASHINGTON.;
08-14-2015 04:30 PM
Here is one idea rather than the proc sql subquery update method you have above:
create table file_out as
case when b.tloc ^= . then b.tloc else a.zipcode end as zipcode,
case when c.tedpt2 ^= . and b.tarea1 ^= 'AYA' then c.tarea2 else a.area1 end as area1,
case when c.tdept2 ^= . and b.tarea1 ^= 'AYA' then c.areadesc else a.ldesc end as ldesc,
case when b.tloc = . then 1 else 0 end as miss_ind_21,
case when c.tdept2 = . then 1 else 0 end as miss_ind_32
from file1 a left join file2 b on a.name = b.tname
left join file3 c on a.dept = c.tdept2;
data final_file(drop=op miss_ind_21 miss_ind_32);
length op $70;
if miss_ind_21 = 1 then do;
op = 'NAME NOT FOUND: '||name;
if miss_ind_32 = 1 then do;
op = 'DEPT NOT FOUND IN FILE3 '||dept;
08-14-2015 04:58 PM
This is a possible solution for your first lookup, the second one is similar and you are encouraged to give a shot yourself.
proc sort data=file1;
proc sort data=file2;
merge file1(in=f1) file2(keep=name Tloc rename=(Tloc=zipcode) in=f2);
if not f2 then put 'NAME NOT FOUND: ' name;
if f1 and f2;
08-15-2015 09:46 AM
I forgot to tell this. Records should not be sorted, since that file will be used by other projects as well. That's why I didn't tried merge concept.
08-15-2015 08:16 PM
If the original order needs to be kept, the SQL is the LAST thing you want to use. SQL only has two types of output: 1. Sorted (defined by 'order by'). 2.God knows what. I will lay out some idea, it may need some reading/learning, but it can be easily implemented. Take your first lookup for instance:
1. Use file1 to start a data step
2. upload needed components from file2 to a Hash table, use 'name' as key.
3. When lookup failed, print error log, else output to a new file1.
You certainly can upload file3 to another Hash table and finish both lookup in one pass.
The hash method will require your computer RAM is large enough to host either file2 or file3 or both if one pass. Although lookup can be done between many to many, it may not give your desired results, so file2/3 usually need to be unique on 'name'.
08-15-2015 10:02 AM
Thank you so much. I gave a try with your logic, but it didn't went well.Getting some unmatched records as well in the output.
Requirement is we need to do a lookup between two files using a key field, if found then move file2 value to file 1.(Here two fields values in file1 should be updated with file2 field if key matches) Else move 'DEPT NOT FOUND IN FILE3 '||dept in sas log.
Main thing is we should not sort the files, so as you mentioned we can use PROC SQL only for this task. But unfortunately its not working fine. Please guide me.
08-16-2015 01:49 AM
It sounds like you have a master file (FILE1) and two transactions files (FILE2 and FILE3).
You should merge FILE1 with FILE2 using NAME to see which records in FILE2 are found in FILE1. You can then use the matched records to update and the unmatched to report.
data update1 ;
merge file2 (in=in1) file2 (keep=name in=in2);
by name ;
if in1 ;
if not in2 then put name= :$quote. 'not found in FILE1.' ;
proc sql noprint ;
update FILE1 set ZIPCODE = (select TLOC from update1 where FILE1.name = update1.name)
where name in (select name from update1)
08-16-2015 11:46 AM
I do not get the concepts of DYL-280. It looks to the question on file-record processing.
The @ usage in the samples is looking for me to be on mainframe fixed records processing. I that correct?
The remark as the files are used for other projects is pointing in that direction.is
In that case use data-step views to describe the records so you can process them afterwards. This will avoid as much of the IO as possible.
A direct search with proc SQL on a datastep view can behave rather well.
As the SAS approach is normally updating complete tables I want to know about that binary search with DYL-280 and updating one record.
What are all used techniques used in for us understandable words?
08-16-2015 12:22 PM
Bin search means in DYL280 means as follows, Need to do lookup between two flat files. If key is found then need to move field2 value to field1 value in File1, else need towrite hardcoded value into SAS log and process next records.
Key field is NAME;
IF NAME field was found between two files then update AREA field in file1 with value in AREA2 field, else Write :Value not found " into SASLOG. and goto next record.
Note: Records in file2 should not be appened to file1 after lookup processing.
Please let me know, if I didn't make it clear.
08-17-2015 12:52 AM
Ok reading and processing flat files not databases or sas-datasets. That is an important pre-req.
Processing a flat file (searching) is possible when you define a view to that SAS(R) 9.4 Language Reference: Concepts, Fifth Edition
Updating flat-files in place (record oriented) is possible. 24782 - Update an external file in place
Dropping records however is not possible with flat file processing. You can read it all and create a new one. PDSE or using a GDG - batch processing.
What environment are you?
Is that a/ Mainframe and which b/ sas release you are using c/ using classical bound data set or a HFS approach?
What is the sizing of those flat files (no records/vars) , should it run interactively or only in batch
When rewriting the flat files and performance is not a problem you can read/convert them to SAS sort as you like with an ordering indicator (making _n_ permanent) reorder back (sort) before writing out. That is the most easy understandable coding.