BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have a dataset called low_2002 as follows:

 

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

and another dataset called TRI as follows:

 

 

DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
;
RUN;

What I would like to do is replace each county_id's VOC value in low_2002 with the corresponding value in TRI, sequentially, while keeping everything else exactly the same. This will produce 5 datasets. I want to name them low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5. For example, low_2002_TRI_1 will be: (county_id 1001 has the VOC value of 6695.85 replaced with 432.47, while everything else stays the same).

 

 

DATA low_2002_TRI_1;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	432.47
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

low_2002_TRI_2 will be:

 

 

DATA low_2002_TRI_2;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	0.00
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

and all the way until low_2002_TRI_5:

 

DATA low_2002_TRI_5;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	1.52
;
RUN;

Then, I would like to save each of the low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5 as csv files with no variable names and the county_id variable dropped, so that low_2002_TRI_1.csv looks like:

568.762329.47432.47
0.240.0050.41
10.73347.76325.23
0.0059.0677.60
4.990.000.00

 

and low_2002_TRI_2 looks like:

568.762329.476695.85
0.240.000.00
10.73347.76325.23
0.0059.0677.60
4.990.000.00

 

The above is just an example of 5 county_id's, but in reality, I will be applying this procedure to a fuller dataset with approximately 3000 county_id's.

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

At first you might imagine this task would read one obs from LOW_2002 at a time.  It would then selectively output the unchanged obs to all but one of the output datasets.  Then make the change, using the data from TRI, and output the changed obs to the single appropriate dataset.  Go on and do the next record from LOW_2002.

 

But this requires a lot of hard coding of dataset names, among other complexities.

 

A better way is to load the original LOW_2002 into a hash object (i.e. a table in memory).  Then read one record at a time from TRI.  For each such record from TRI, modify the variable VOC in the hash object and then output the hash object to the corresponding dataset.  Then restore the original value to the hash object.

 

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
RUN;
DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
RUN;
data _null_;
  if 0 then set low_2002;
  if _n_=1 then do;
    declare hash h(dataset:'low_2002',ordered:'a');
      h.definekey('county_id');
      h.definedata(all:'Y');
      h.definedone();
  end;
  set tri (rename=(voc=new_voc));
  h.find();
  tmp_voc=voc;
  voc=new_voc;
  h.replace();
  h.output(dataset:cats('low_2002_tri_',_n_));
  voc=tmp_voc;
  h.replace();
run;

Among the advantages of the above is that you don't have to know in advance how many datasets will be produced.  Instead just dynamically generate as many dataset names as needed by constructing the dataset names in the hash OUTPUT method.  It will produce as many datasets as there are observations in TRI.

 

BTW, this program assumes there is a one-to-one correspondence of COUNTY_ID in TRI to COUNTY_ID in LOW_2002.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

At first you might imagine this task would read one obs from LOW_2002 at a time.  It would then selectively output the unchanged obs to all but one of the output datasets.  Then make the change, using the data from TRI, and output the changed obs to the single appropriate dataset.  Go on and do the next record from LOW_2002.

 

But this requires a lot of hard coding of dataset names, among other complexities.

 

A better way is to load the original LOW_2002 into a hash object (i.e. a table in memory).  Then read one record at a time from TRI.  For each such record from TRI, modify the variable VOC in the hash object and then output the hash object to the corresponding dataset.  Then restore the original value to the hash object.

 

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
RUN;
DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
RUN;
data _null_;
  if 0 then set low_2002;
  if _n_=1 then do;
    declare hash h(dataset:'low_2002',ordered:'a');
      h.definekey('county_id');
      h.definedata(all:'Y');
      h.definedone();
  end;
  set tri (rename=(voc=new_voc));
  h.find();
  tmp_voc=voc;
  voc=new_voc;
  h.replace();
  h.output(dataset:cats('low_2002_tri_',_n_));
  voc=tmp_voc;
  h.replace();
run;

Among the advantages of the above is that you don't have to know in advance how many datasets will be produced.  Instead just dynamically generate as many dataset names as needed by constructing the dataset names in the hash OUTPUT method.  It will produce as many datasets as there are observations in TRI.

 

BTW, this program assumes there is a one-to-one correspondence of COUNTY_ID in TRI to COUNTY_ID in LOW_2002.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
elbarto
Obsidian | Level 7

Thanks! That's really smart and efficient way of doing it. Do you have any ideas how to accomplish the last part? I would like to export each dataset to csv without the county_id variable. So for example, for Low_2002_tri_1, I would do:

 

data Low_2002_tri_1(drop=county_id);
set Low_2002_tri_1;
run;

proc export data=Low_2002_tri_1
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;

But, how can I loop this for all datasets?

elbarto
Obsidian | Level 7

Ah nevermind, I figured it out by writing a macro, thanks again!

ballardw
Super User

@elbarto wrote:

Thanks! That's really smart and efficient way of doing it. Do you have any ideas how to accomplish the last part? I would like to export each dataset to csv without the county_id variable. So for example, for Low_2002_tri_1, I would do:

 

data Low_2002_tri_1(drop=county_id);
set Low_2002_tri_1;
run;

proc export data=Low_2002_tri_1
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;

But, how can I loop this for all datasets?


Pretty much every where you use a data set you can use Data set Options to drop, keep or rename variables.

So

proc export data=Low_2002_tri_1 (drop=country_id)
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;

No need for that potentially dangerous data step to corrupt your data set.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 722 views
  • 1 like
  • 3 in conversation