Hi SAS Community,
I am trying to merge two datasets by region. In general, some region values can end with decimal places and some don't. The problem is that the region variable values are slightly different between the datasets. In the first dataset, I get the original region values. In the second dataset, if the value does not end with decimal places then two zeroes ("00") are added. I believe the latter was done to make sure all values have the same lengths (to be honest, the reason why is beyond me). Moreover, in the second dataset, the decimal itself is removed from the value. I need region to be identically coded so I can merge the datasets.
My goal is to either add two zeroes to the end of the region values in dataset_1, where values do not end with decimals. Or, remove the two trailing zeroes from the region values in dataset_2, where values end in two zeroes. I also want to either add or remove the decimal point. Region is a numeric variable and has a max character length of 10 (e.g. 5356789.02). Can anyone help me do this quickly? I have over 300+ region codes to deal with 😞 I tabulate my data have and data want below:
DATA HAVE (region variable)
dataset_1 dataset_2
100010 10001000
100020.01 10002001
100020.02 10002002
100033 10003300
256006 25600600
3451678.01 345167801
DATA WANT (region variable)
dataset_1 or dataset_2
10001000 100010
10002001 100020.01
10002002 100020.02
10003300 100033
25600600 256006
345167801 3451678.01
(I basically want the region values to be identical)
Thank you, and please let me know if more information is required.
Presumably the original DATASET_1 value never matches DATASET_2. It always either has a decimal to be remove, or needs to have two zeroes concatenated to it. If so then:
data want;
set have;
if indexc(dataset_1,'.')>0 then new_dataset_1=compress(dataset_1,'.');
else new_dataset_1=cats(dataset_1,'00');
run;
The INDEXC function returns the position of the first instance of the second argumen ('.') found in the first argument (DATASET_1). If there is no '.', a zero is returned.
The COMPRESS function returns the value of the first argument all instances of the second argument ('.') removed.
and the CATS (concatenated with leading/trailing blanks stripped) just appends '00' to DATASET_1, with no intervening blanks.
Like this?
proc format ;
picture FMT other='0099999999' (mult=100);
run;
data TEST;
input X $10.;
Y=put(input(X,best.),fmt. -l);
cards;
100010
100020.01
100020.02
100033
256006
3451678.01
run;
proc print noobs;
run;
X | Y |
---|---|
100010 | 10001000 |
100020.01 | 10002001 |
100020.02 | 10002002 |
100033 | 10003300 |
256006 | 25600600 |
3451678.01 | 345167801 |
Hi @TL93,
@TL93 wrote: Region is a numeric variable and has a max character length of 10 (e.g. 5356789.02).
If it's really a numeric variable, a multiplication by 100 might work.
data dataset_1;
input region;
cards;
100010
100020.01
100020.02
100033
256006
3451678.01
;
data want;
set dataset_1;
region=round(region*100, 1e-6);
run;
Here's why the ROUND function (with an appropriate rounding unit, if any) should not be omitted:
445 data _null_; 446 if 1.09*100 ne 109 then put 'Surprise!'; 447 run; Surprise! NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Instead of matching one to the other, I'd rather convert both to the same pattern. You're saying that your region variable is "numeric", but it looks like mean that its printed content looks like a number - either integer or decimal - while it is stored as a character data type of length 10 (confirmed by the fact that in your sample the values are left-justified). If so, you can do the same transformation on both sides:
data dataset_1 ;
input region $10. ;
cards ;
100010
100020.01
100020.02
100033
256006
3451678.01
;
run ;
data dataset_2 ;
input region $10. ;
cards ;
10001000
10002001
10002002
10003300
25600600
345167801
;
run ;
%let pattern = 0000000000 ;
data dataset_1_key (drop = _:) ;
set dataset_1 ;
key = "&pattern" ;
_r = compress (region, ".") ;
substr (key, 1, length (_r)) = _r ;
run ;
data dataset_2_key (drop = _:) ;
set dataset_2 ;
key = "&pattern" ;
_r = compress (region, ".") ;
substr (key, 1, length (_r)) = _r ;
run ;
Then you can match by KEY.
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.