DATA Step, Macro, Functions and more

Help read-in messy data

Reply
Occasional Contributor
Posts: 5

Help read-in messy data

Hi all,

I'm given a horrible task of QC and reading in 48 monthly csv files which looks like below, and they are all in one csv sheet I also attached a copy of the , I don't know how to read this in as SAS data set and I really need your help.

What I am after is 2 SAS data set, one for each teams belongs to segment ABC i.e. TEAM_123_SEGMENT_ABC.SAS7BDAT & TEAM_125SEGMENT_ABC.SAS7BDAT, and I also need to check if the 4 totals in the NO OF POL FOR TEAM xxx trailer adds up for each COL1, COL2, COL3 & AMT OUTSTANDING as well as checking the trailer for NO OF POL FOR SEGMENT xxx add up for team 123 & team 124 that belongs to Segment ABC.

The data is quite messy please contact me so I can send the .csv file.

Many Thanks Message was edited by: psquares
Occasional Contributor
Posts: 5

Re: Help read-in messy data

After tackling this problem for the last 4 hours I'm getting close to what I want which is 2 SAS data sets one for Team 123 of Segment ABC and one SAS data set for team 125 of Segment ABC by making the first field to be character but it needs to be numeric.

But I also want to check the sum of COL1, COL2, COL3 & AMT_OUTSTANDING against the trailers if possible in my code below and I'm much grateful for any suggestions to make this thing works.
COL1 COL2 COL3 AMT_OUT
NO OF POL FOR TEAM 124 --------------- --------------- ---------------

8288.22 8288.22 13249.152 4630.2032

To re-create the csv file please copy the text to notepad and rename the extension to .csv to test out the code.

%macro attributes ; %local maxlength ; %let maxlength = $200. ; %do i = 1 %to 9 ; length field&i &maxlength. ; %end ; /* */

attrib ID_NO length=$20.; attrib CNT length=$10.; attrib FIRM length=$50.; attrib CITY length=$20.; attrib ZIP length=$4.; attrib DATE1 length=$10. ; attrib TYPE length=$5.; attrib POL_NO length=$10.; attrib POL_CODE length=$5.; attrib DATE_ADDED length=$10. ; attrib COL1 length=8. ; attrib COL2 length=8. ; attrib COL3 length=8. ; attrib AMT_OUTSTANDING length=8.;

%mend ;

%macro readFile(outds=, source=) ;

%let file = J:\HAVE.CSV ;

data &outds (drop=fieldSmiley Happy; infile "&file." MISSOVER DSD DLM="," firstobs=1 LRECL=1024 end=eof ;

%attributes ;

input field1-field14 ;

ID_NO = field1; CNT = field2; FIRM = field3; CITY = field4; ZIP = field5; DATE1 = field6; TYPE = field7; POL_NO = field8; POL_CODE = field9; DATE_ADDED = field10; COL1 = input(field11, best23.2); COL2 = input(field12, best23.2); COL3 = input(field13, best23.2); AMT_OUTSTANDING = input(field14, best23.2);

if compbl(substr(ID_NO,1,3)) in ('PRO','COM','NEW','FOR','FUN','SOL','ABN','NO','ID ','---','N.B','* *',' ') then delete ;

run ; %mend ;

%macro process (source=); %readFile (source=&source, outds=&source._read) ; %mend ;

%process (source=HAVE) ;
Respected Advisor
Posts: 4,173

Re: Help read-in messy data

Hi

Not sure if I totally got how your data structure looks like.

From what I understand I would propose you create 2 datasets: 1 contains the team totals, one contains the team details.

Key for joining (merging) these 2 datasets is the team#.

Once you have the data organised this way it shouldn't be too hard to create whatever report you need (just use by group processing).

I believe you shouldn't use macro language at all for a problem which can be solved without.

I hope the following code will give you at least some ideas of how you can read such "messy" data:


data
TeamTotal TeamDetail;
infile datalines dsd dlm=',' truncover;
input @;

retain team;

if find(_infile_,'NO OF POL FOR TEAM','i')>0 then
do;
team=scan(scan(_infile_,1,','),-1,' ');
input _dummy_:$1. col1 col2 col3 AMT_OUT;
output TeamTotal;
end;
else
do;
input col1 col2 col3 AMT_OUT;
output TeamDetail;
end;
keep team col1 col2 col3 AMT_OUT;
datalines;
NO OF POL FOR TEAM 124,6,8,10,11
1,2,3,4
5,6,7,8
;
run;

HTH
Patrick
Ask a Question
Discussion stats
  • 2 replies
  • 126 views
  • 0 likes
  • 2 in conversation