Hi SAS folks-
I’ve been working with messy Excel files again, and need your help with a numeric format. In this case, I’m making latitude and longitude in degrees, minutes, and seconds from a text field.
My data looks something like this:
excel.have:
GPS_DMS
N28.36.678 W081.35.514
N2836678 W08135514
proc sql noprint ;
Create table want as select
input(scan(compress(GPS_DMS,'.'), 1, , "dko"),8.1) as lat6
, input(scan(compress(GPS_DMS,'.'), -1, , "dko"),8.1) as long6
from excel.have ;
quit ;
want:
lat6 long6
283667.8 813551.4
283667.8 813551.4
So far, all of the entries have 7 digits, and the numeric format of 8.1 works perfectly.
But… if I had different numbers of digits in some lines of the original data (and I'm sure that will happen eventually), there will always be one digit in the decimal regardless of the length, which would be incorrect. I’m sure there is a perfectly simple solution, but I read over the documentation for numeric formats, and I’m not able to find it.
Could someone please point me in the right direction? We're running SAS 9.4M3 via remote login to a Windows virtual machine.
Thanks so much for your help!
Wendy T.
Simple example:
data have ;
length text_var $12 ;
input text_var $ ;
datalines ;
1234567890
123456789
12345678
1234567
123456
;
run ;
data example ; set have ;
wrong_decimal=input(text_var,8.1) ;
run ;
want:
123456.7890
123456.789
123456.78
123456.7
123456
You could just insert a decimal point in the text string after the first six digits followed by whichever digits remain, if any:
data have ;
length text_var $12 ;
input text_var $ ;
datalines ;
1234567890
123456789
12345678
1234567
123456
;
run ;
data example ;
set have ;
wrong_decimal=input(text_var,8.1) ;
right_decimal = input(prxchange('s/(\d{6})(\d{0,})/\1.\2/', 1, text_var), best.);
format right_decimal 12.4;
run ;
You could just insert a decimal point in the text string after the first six digits followed by whichever digits remain, if any:
data have ;
length text_var $12 ;
input text_var $ ;
datalines ;
1234567890
123456789
12345678
1234567
123456
;
run ;
data example ;
set have ;
wrong_decimal=input(text_var,8.1) ;
right_decimal = input(prxchange('s/(\d{6})(\d{0,})/\1.\2/', 1, text_var), best.);
format right_decimal 12.4;
run ;
data t;
input Lat;
cards;
123456789
12345678
1234567
123456
;
run;
data t2;
set t;
LatCharacter=catx('.',substr(compress(put(lat,$10.)),1,6),substr(compress(put(lat,$10.)),7));
LatNumeric=LatCharacter*1;
run;
Don't know if you need this one way or another so creating two variables in the t2 dataset with same values.
You could always just have SAS do the math:
data example ; set have ; right_decimal=input(text_var,12.); if length(text_var) gt 6 then right_decimal=right_decimal/ (10**(length(text_var)-6)); run ;
Art, CEO, AnalystFinder.com
Is your intent to create a decimal version of degrees latitude and longitude? If so this is the wrong path.
Maybe your are really looking at something more like:
data example; GPS_DMS="N28.36.678 W081.35.514"; DegN = input(scan(GPS_DMS,1,'NW. '),best.); MinN = input(scan(GPS_DMS,2,'NW. '),best.); SecN = input(scan(GPS_DMS,3,'NW. '),best.); DegW = input(scan(GPS_DMS,4,'NW. '),best.); MinW = input(scan(GPS_DMS,5,'NW. '),best.); SecW = input(scan(GPS_DMS,6,'NW. '),best.); Lat = DegN + MinN/60 + SecN/3600; Long = DegW + MinW/60 + SecW/3600; run;
The following code uses the not so intuitive behaviour of w.d SAS informats to set a decimal point if there is no point in the input string and the format has an explicit .w defined.
data have;
length text_var $12;
input text_var $;
datalines;
1234567890
123456789
12345678
1234567
123456
123.456.789
N28.36.678
;
run;
data want;
set have;
length _inter_str _length_str $12;
/* remove all non-digit characters */
_inter_str=compress(text_var,,'kd');
/* determine length of string with digits only */
_length_str=lengthn(_inter_str);
/* create informat for string with digits only */
_informat=cats(_length_str,'.',max(0,_length_str-6));
/* read string with digits only into numeric SAS variable */
want_number_1=inputn(_inter_str,_informat);
/* same as spaghetti code */
want_number_2=inputn(compress(text_var,,'kd'),cats(lengthn(_inter_str),'.',max(0,lengthn(_inter_str)-6)));
run;
Thank you so much to everyone for your solutions!
Yes, I am looking to go from ugly lat/long in degrees minutes seconds to decimal degrees using proc SQL.
I am going to choose Collin as my 'right' answer, as his solution is insensitive to the number of digits over six, and also to the presence/absence of the period separators.
I did find that when I used -1 to scan from the end of the string back for longitude that I needed to multiply the result by 10 for the result to have the right number of decimals. I'm sure that would not be necessary if I understood the prxchange function.
Wendy T.
data have ;
length ZONE_GPS_DMS $25 ;
input ZONE_GPS_DMS 1-25 ;
datalines ;
283667 0813551
N28.36.67 W081.35.51
N283667 W0813551
q28.36678 x081.35514
N28.36.678 W081.35.514
N2836678 W08135514
N28366789 W081355149
N28.36.678 W081.35.5149
N2836678 W081355149
;
run ;
proc sql noprint ;
create table example
as select
ZONE_GPS_DMS,
(input(scan(ZONE_GPS_DMS,1,'NW. '),best.)) + (input(scan(ZONE_GPS_DMS,2,'NW. '),best.))/60 + (input(scan(ZONE_GPS_DMS,3,'NW. '),best.))/3600 as lat_dd_ballard
, (input(scan(ZONE_GPS_DMS,4,'NW. '),best.)) + (input(scan(ZONE_GPS_DMS,5,'NW. '),best.))/60 + (input(scan(ZONE_GPS_DMS,6,'NW. '),best.))/3600 as long_dd_ballard
, input(prxchange('s/(\d{6})(\d{0,})/\1.\2/', 1, scan(compress(ZONE_GPS_DMS,'.'), 1, , "dko")), best.) as collin_lat ,
(int(calculated collin_lat/10000)) + ((int(calculated collin_lat/100) - ((int(calculated collin_lat/10000)) * 100))/60) +
((calculated collin_lat - ((int(calculated collin_lat/10000)) * 10000)-((int(calculated collin_lat/100) -
((int(calculated collin_lat/10000)) *100))*100))/3600)
as LAT_DD_collin label='Latitude (dd)'
, input(prxchange('s/(\d{6})(\d{0,})/\1.\2/', 1, scan(compress(ZONE_GPS_DMS,'.'), -1, , "dko")), best.)*10 as collin_long ,
(int(calculated collin_long/10000)) + ((int(calculated collin_long/100) - ((int(calculated collin_long/10000)) * 100))/60) +
((calculated collin_long - ((int(calculated collin_long/10000)) * 10000)-((int(calculated collin_long/100) -
((int(calculated collin_long/10000)) *100))*100))/3600)
as LONG_DD_collin label='Longitude (dd)'
from have ;
quit ;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.