Solved: Re: Replacing values and outputting data as separate csv file

elbarto · Posted 04-03-2022 07:00 PM

I have a dataset called low_2002 as follows:

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

and another dataset called TRI as follows:

DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
;
RUN;

What I would like to do is replace each county_id's VOC value in low_2002 with the corresponding value in TRI, sequentially, while keeping everything else exactly the same. This will produce 5 datasets. I want to name them low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5. For example, low_2002_TRI_1 will be: (county_id 1001 has the VOC value of 6695.85 replaced with 432.47, while everything else stays the same).

DATA low_2002_TRI_1;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	432.47
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

low_2002_TRI_2 will be:

DATA low_2002_TRI_2;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	0.00
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
;
RUN;

and all the way until low_2002_TRI_5:

DATA low_2002_TRI_5;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	1.52
;
RUN;

Then, I would like to save each of the low_2002_TRI_1, low_2002_TRI_2, ..., low_2002_TRI_5 as csv files with no variable names and the county_id variable dropped, so that low_2002_TRI_1.csv looks like:

568.76	2329.47	432.47
0.24	0.00	50.41
10.73	347.76	325.23
0.00	59.06	77.60
4.99	0.00	0.00

and low_2002_TRI_2 looks like:

568.76	2329.47	6695.85
0.24	0.00	0.00
10.73	347.76	325.23
0.00	59.06	77.60
4.99	0.00	0.00

The above is just an example of 5 county_id's, but in reality, I will be applying this procedure to a fuller dataset with approximately 3000 county_id's.

mkeintz · Posted 04-03-2022 08:11 PM

At first you might imagine this task would read one obs from LOW_2002 at a time. It would then selectively output the unchanged obs to all but one of the output datasets. Then make the change, using the data from TRI, and output the changed obs to the single appropriate dataset. Go on and do the next record from LOW_2002.

But this requires a lot of hard coding of dataset names, among other complexities.

A better way is to load the original LOW_2002 into a hash object (i.e. a table in memory). Then read one record at a time from TRI. For each such record from TRI, modify the variable VOC in the hash object and then output the hash object to the corresponding dataset. Then restore the original value to the hash object.

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
RUN;
DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
RUN;
data _null_;
  if 0 then set low_2002;
  if _n_=1 then do;
    declare hash h(dataset:'low_2002',ordered:'a');
      h.definekey('county_id');
      h.definedata(all:'Y');
      h.definedone();
  end;
  set tri (rename=(voc=new_voc));
  h.find();
  tmp_voc=voc;
  voc=new_voc;
  h.replace();
  h.output(dataset:cats('low_2002_tri_',_n_));
  voc=tmp_voc;
  h.replace();
run;

Among the advantages of the above is that you don't have to know in advance how many datasets will be produced. Instead just dynamically generate as many dataset names as needed by constructing the dataset names in the hash OUTPUT method. It will produce as many datasets as there are observations in TRI.

BTW, this program assumes there is a one-to-one correspondence of COUNTY_ID in TRI to COUNTY_ID in LOW_2002.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 04-03-2022 08:11 PM

At first you might imagine this task would read one obs from LOW_2002 at a time. It would then selectively output the unchanged obs to all but one of the output datasets. Then make the change, using the data from TRI, and output the changed obs to the single appropriate dataset. Go on and do the next record from LOW_2002.

But this requires a lot of hard coding of dataset names, among other complexities.

A better way is to load the original LOW_2002 into a hash object (i.e. a table in memory). Then read one record at a time from TRI. For each such record from TRI, modify the variable VOC in the hash object and then output the hash object to the corresponding dataset. Then restore the original value to the hash object.

DATA low_2002;
input county_id	ammonia nitrogen voc;
DATALINES;
1001	568.76	2329.47	6695.85
1003	0.24	0.00	50.41
1005	10.73	347.76	325.23
1007	0.00	59.06	77.60
1009	4.99	0.00	0.00
RUN;
DATA TRI;
input county_id	voc;
DATALINES;
1001	432.47
1003	0.00
1005	3.22
1007	0.00
1009	1.52
RUN;
data _null_;
  if 0 then set low_2002;
  if _n_=1 then do;
    declare hash h(dataset:'low_2002',ordered:'a');
      h.definekey('county_id');
      h.definedata(all:'Y');
      h.definedone();
  end;
  set tri (rename=(voc=new_voc));
  h.find();
  tmp_voc=voc;
  voc=new_voc;
  h.replace();
  h.output(dataset:cats('low_2002_tri_',_n_));
  voc=tmp_voc;
  h.replace();
run;

Among the advantages of the above is that you don't have to know in advance how many datasets will be produced. Instead just dynamically generate as many dataset names as needed by constructing the dataset names in the hash OUTPUT method. It will produce as many datasets as there are observations in TRI.

BTW, this program assumes there is a one-to-one correspondence of COUNTY_ID in TRI to COUNTY_ID in LOW_2002.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

elbarto · Posted 04-03-2022 08:37 PM

Thanks! That's really smart and efficient way of doing it. Do you have any ideas how to accomplish the last part? I would like to export each dataset to csv without the county_id variable. So for example, for Low_2002_tri_1, I would do:

data Low_2002_tri_1(drop=county_id);
set Low_2002_tri_1;
run;

proc export data=Low_2002_tri_1
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;

But, how can I loop this for all datasets?

elbarto · Posted 04-03-2022 10:43 PM

Ah nevermind, I figured it out by writing a macro, thanks again!

ballardw · Posted 04-04-2022 11:43 AM

@elbarto wrote:

Thanks! That's really smart and efficient way of doing it. Do you have any ideas how to accomplish the last part? I would like to export each dataset to csv without the county_id variable. So for example, for Low_2002_tri_1, I would do:
data Low_2002_tri_1(drop=county_id);
set Low_2002_tri_1;
run;

proc export data=Low_2002_tri_1
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;
But, how can I loop this for all datasets?

Pretty much every where you use a data set you can use Data set Options to drop, keep or rename variables.

So

proc export data=Low_2002_tri_1 (drop=country_id)
 	outfile="E:\folder\Low_2002_tri_1.csv"
	dbms=csv
	replace;
	putnames=no;
run;

No need for that potentially dangerous data step to corrupt your data set.

Replacing values and outputting data as separate csv file

Re: Replacing values and outputting data as separate csv file

Re: Replacing values and outputting data as separate csv file

Re: Replacing values and outputting data as separate csv file

Re: Replacing values and outputting data as separate csv file

Re: Replacing values and outputting data as separate csv file

Click image to register for webinar

Classroom Training Available!