BookmarkSubscribeRSS Feed
TL93
Obsidian | Level 7

Hi SAS Community,

 

I am trying to merge two datasets by region. In general, some region values can end with decimal places and some don't. The problem is that the region variable values are slightly different between the datasets. In the first dataset, I get the original region values. In the second dataset, if the value does not end with decimal places then two zeroes ("00") are added. I believe the latter was done to make sure all values have the same lengths (to be honest, the reason why is beyond me). Moreover, in the second dataset, the decimal itself is removed from the value. I need region to be identically coded so I can merge the datasets.

 

My goal is to either add two zeroes to the end of the region values in dataset_1, where values do not end with decimals. Or, remove the two trailing zeroes from the region values in dataset_2, where values end in two zeroes. I also want to either add or remove the decimal point. Region is a numeric variable and has a max character length of 10 (e.g. 5356789.02). Can anyone help me do this quickly? I have over 300+ region codes to deal with 😞 I tabulate my data have and data want below:

 

DATA HAVE (region variable)

dataset_1                                  dataset_2

100010                                      10001000

100020.01                                 10002001

100020.02                                 10002002

100033                                      10003300

256006                                      25600600

3451678.01                               345167801

 

DATA WANT (region variable)

dataset_1               or                dataset_2

10001000                                  100010

10002001                                  100020.01

10002002                                  100020.02

10003300                                  100033

25600600                                  256006

345167801                                3451678.01

(I basically want the region values to be identical)

 

Thank you, and please let me know if more information is required.

4 REPLIES 4
mkeintz
PROC Star

Presumably the original DATASET_1 value never matches DATASET_2.  It always either has a decimal to be remove, or needs to have two zeroes concatenated to it.  If so then:

 

data want;
  set have;
  if indexc(dataset_1,'.')>0 then new_dataset_1=compress(dataset_1,'.');  
  else new_dataset_1=cats(dataset_1,'00');
run;

 

 

The INDEXC function returns the position of the first instance of the second argumen ('.') found in the first argument (DATASET_1).  If there is no '.', a zero is returned.

 

The COMPRESS function returns the value of the first argument all instances of the second argument ('.') removed.

 

and the CATS (concatenated with leading/trailing blanks stripped) just appends '00' to DATASET_1, with no intervening blanks.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Like this?

proc format ;
  picture FMT other='0099999999' (mult=100);
run;
data TEST;
  input X $10.;
  Y=put(input(X,best.),fmt. -l);
cards;
100010               
100020.01            
100020.02            
100033               
256006               
3451678.01 
run;
proc print noobs;
run;

X Y
100010 10001000
100020.01 10002001
100020.02 10002002
100033 10003300
256006 25600600
3451678.01 345167801

 

 

FreelanceReinh
Jade | Level 19

Hi @TL93,


@TL93 wrote: Region is a numeric variable and has a max character length of 10 (e.g. 5356789.02).

If it's really a numeric variable, a multiplication by 100 might work.

data dataset_1;
input region;
cards;
100010               
100020.01            
100020.02            
100033               
256006               
3451678.01 
;

data want;
set dataset_1;
region=round(region*100, 1e-6);
run;

Here's why the ROUND function (with an appropriate rounding unit, if any) should not be omitted:

445  data _null_;
446  if 1.09*100 ne 109 then put 'Surprise!';
447  run;

Surprise!
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
hashman
Ammonite | Level 13

@TL93:

Instead of matching one to the other, I'd rather convert both to the same pattern. You're saying that your region variable is "numeric", but it looks like mean that its printed content looks like a number - either integer or decimal - while it is stored as a character data type of length 10 (confirmed by the fact that in your sample the values are left-justified). If so, you can do the same transformation on both sides:    

data dataset_1 ;                      
  input region $10. ;                 
  cards ;                             
100010                                
100020.01                             
100020.02                             
100033                                
256006                                
3451678.01                            
;                                     
run ;                                 
data dataset_2 ;                      
  input region $10. ;                 
  cards ;                             
10001000                              
10002001                              
10002002                              
10003300                              
25600600                              
345167801                             
;                                     
run ;                                 
                                      
%let pattern = 0000000000 ;           
                                      
data dataset_1_key (drop = _:) ;      
  set dataset_1 ;                     
  key = "&pattern" ;                  
  _r = compress (region, ".") ;       
  substr (key, 1, length (_r))  = _r ;
run ;                                 
                                      
data dataset_2_key (drop = _:) ;      
  set dataset_2 ;                     
  key = "&pattern" ;                  
  _r = compress (region, ".") ;       
  substr (key, 1, length (_r))  = _r ;
run ;                                 

Then you can match by KEY.

 

Kind regards

Paul D.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3126 views
  • 3 likes
  • 5 in conversation