I'm struggling to figure out how to put these datelines in a data step.
The first variable is the first set of numbers, which is the ID, they can be variable.
The last variable is the last two letters, representing country.
The middle variable is all of the stuff between the ID and Country Code, namely the Business Name
47185 KN Outdoor Trading Inc US
4742 Luna sastreria S.A. ES
479 Norsok A/S DK
I would normally try to manually align the code to fix problem, however, my data is quite large.
Is there a way to make the last 2 letters of the string in the datalines align to the right to make it easier?
Or is there a better solution to this problem?
data want;
input;
call scan(_infile_,1,p1,l1,' ');
call scan(_infile_,-1,p2,l2,' ');
id=scan(_infile_,1,' ');
business=substr(_infile_,p1+l1,p2-p1-l1);
country=scan(_infile_,-1,' ');
drop p1 l1 p2 l2;
cards;
47185 KN Outdoor Trading Inc US
4742 Luna sastreria S.A. ES
479 Norsok A/S DK
;
proc print;run;
Hello @MikeYick and welcome to the SAS Support Communities!
Actually, it's only the the missing double blank between business name (containing blanks) and country that lets modified list input fail. So, one way to read the data is to let SAS insert the missing blank:
data want;
input @;
_infile_=prxchange('s/(.+) (\w+)/$1 $2/',1,_infile_);
input ID :$12. business &$40. country :$2.;
cards;
47185 KN Outdoor Trading Inc US
4742 Luna sastreria S.A. ES
479 Norsok A/S DK
;
You may want to modify the suggested lengths (12, 40, 2) as needed (e.g. start with generous lengths and then determine the maximum actual length of the values of each variable).
So you know
First, get a count of "words" with countw(_infile_)
scan(_infile_,1) is your first item
scan(_infile_,countw(_infile_)) is your last item
loop from 2 to countw() - 1 to get all the "middle words" and concatenate them with catx.
data want;
input;
call scan(_infile_,1,p1,l1,' ');
call scan(_infile_,-1,p2,l2,' ');
id=scan(_infile_,1,' ');
business=substr(_infile_,p1+l1,p2-p1-l1);
country=scan(_infile_,-1,' ');
drop p1 l1 p2 l2;
cards;
47185 KN Outdoor Trading Inc US
4742 Luna sastreria S.A. ES
479 Norsok A/S DK
;
proc print;run;
This is where the use of a negative word position (the -1 in the scan function below) is helpful, because the -1 means the last word in a string of words:
data want;
input id business $60.;
length country $2.;
country=scan(business,-1);
business=substr(business,1,length(business)-length(country));
put business=;
cards;
47185 KN Outdoor Trading Inc US
4742 Luna sastreria S.A. ES
479 Norsok A/S DK
;
This code initially reads both business and country into the business variable.
Then it gets the last word of business for the country variable (scan with -1).
After that just take a substring of business minus the last X characters (where X is length(country)).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.