Re: Select non-zero observations with each column using macro

Xusheng · Posted 01-29-2018 09:57 AM

Hi, I have a question about how to deal with each column using macro. Here is what the sample looks like:

date	time	interval	weekday	fj_dummy	jump	surec1	surec2	surus1	surus2	surus3	suruk1	suruk2
11/1/2004	09:00.0	2	2	0	0		2					1
11/1/2004	14:00.0	3	2	0	0	1					3
11/1/2004	19:00.0	4	2	0	0		1			2
11/1/2004	24:00.0	5	2	0	0			4				1
11/1/2004	29:00.0	6	2	0	0

For each time, I just want to deal with 'date','time','interval','weekday','fj_dummy','jump', and one of the rest of the columns. I only want to keep the observations that are non-missing and delete the missing observations. I realized I need to use macro since my whole sample is much bigger than the sample displayed here. I don't know within each 'ec','us','uk' category how to select the non-missing observations for each subcategory, which is 'surec1', 'surec2', etc.

If possible, can I export the files with macro for each subcategory and also keep 'date','time','interval','weekday','fj_dummy','jump' ? That is to say, in each file, I have date','time','interval','weekday','fj_dummy','jump' and one of the rests of columns.

This is one of the csv files called EUR, I also have two other csv files with the same structure as this. So I'm wondering in this case, do I need more than one macro?

Here is the code I wrote, but I was stuck here for a long time. Any help would be appreciated.

%let path=E:\one_drive\OneDrive\RA\Jumps;					/*Set path*/
libname RA "&path";
%Macro news(cur);
PROC IMPORT OUT= RA.&cur._jump
            DATAFILE= "&path\Data\&cur._jump_news_abs1.csv" 
            DBMS=CSV REPLACE;
     GETNAMES=yes;
     DATAROW=2; 
RUN;
DATA RA.&cur._jump;
	set RA.&cur._jump;
	IF fj_dummy=0 THEN DELETE;	
RUN;
%Mend news;
%news (EUR);
%news (GBP);
%news (JPY);
%Macro currency(cur);
data RA.&cur._jump;
set RA.&cur._jump;
%macro surprise(cur,n_f,n_e);
%do i=&n_f %to &n_e;
IF sur&cur.&i.=. THEN sur&cur.&i.=0;
%end;
%mend surprise;
%surprise(ec,1,16);
%surprise(us,1,43);
%surprise(fr,1,3);
%surprise(ge,1,13);
%surprise(it,1,8);
%surprise(jp,1,3);
%surprise(po,1,2);
%surprise(sp,1,4);
%surprise(uk,1,13);
run;

%Mend currency;
%currency(EUR);
%currency(GBP);
%currency(JPY);



%Macro currency(cur);
%Macro country(con);
DATA RA.&cur._jump_sur&con.;
	set RA.&cur._jump;
	Keep Date time interval weekday fj_dummy jump sur&con.:;	
RUN;
%Mend  country;
%country(ec);
%country(us);
%country(fr);
%country(ge);
%country(it);
%country(jp);
%country(po);
%country(sp);
%country(uk);
%Mend currency;
%currency(EUR);
%currency(GBP);
%currency(JPY);

PaigeMiller · Posted 01-29-2018 10:04 AM

I'm afraid your explanation of the desired result isn't clear.

and one of the rest of the column

Which one? you didn't tell us

I only want to keep the observations that are non-missing and delete the missing observations.

What is an observation that is non-missing in your example? I don't see one.

I don't know within each 'ec','us','uk' category

There are no such categories that I can see

Show us, for this example data, what is the desired output. Type it in, show us.

--
Paige Miller

Xusheng · Posted 01-29-2018 10:23 AM

Sure, 'one of the rest of column' means choose from 'surec1', 'surec2', 'surec3', 'surus1',' surus2', and so on.

where 'sur' means 'surprise', 'ec', 'us' are the code for different region or countries. 1, 2, 3 are the different categories within each country. I have 9 countries in total and within each country, I have different number of categories. The sample I attached was cut by the page so that's why the sample didn't help much in explanation.

non-missing observations are the numbers in each column. the missing one are the blank cell in each column. The missing observations only appear under the column 'surec1', 'surec2' and so on.

The desired output would be as following.

date	time	interval	weekday	fj_dummy	jump	surec1	surec2	surus1	surus2	surus3	suruk1	suruk2
11/1/2004	09:00.0	2	2	0	0		2					1
11/1/2004	14:00.0	3	2	0	0	1					3
11/1/2004	19:00.0	4	2	0	0		1			2
11/1/2004	24:00.0	5	2	0	0			4				1
11/1/2004	29:00.0	6	2	0	0
Count						1	2	1	0	1	1	2

PaigeMiller · Posted 01-29-2018 10:29 AM

Well I hope someone smarter than I am can figure out how to go from the inputs to the outputs.

In your original data, 11/1/2004 09:00.0, surec2 is blank, but in the output dataset surec2 is 2.

My head hurts.

--
Paige Miller

Xusheng · Posted 01-29-2018 10:34 AM

Sorry, I forgot to update my input file. Now it should make more sense.

Tom · Posted 01-29-2018 10:37 AM

I really have no idea what your question is, but it you want to produce that line labeled COUNT in the spreadsheet like thing you posted then you might look at PROC MEANS/SUMMARY.

proc summary data=have ;
  var sur: ;
   output out=counts n= ;
run;

Xusheng · Posted 01-29-2018 11:00 AM

Thank you, Tom. I will try that.

Xusheng · Posted 01-29-2018 11:12 AM

Hi Tom, the log erro shows 'surec1' in list does not match type prescribed. Do you know why does that happen?

PaigeMiller · Posted 01-29-2018 11:15 AM

If you are reading this data in from Excel (or even sometimes if you are not), sometimes the columns that appear to us humans to be numbers are treated as characters values.

If that's the problem, you can force the columns to be numeric.

--
Paige Miller

Tom · Posted 01-29-2018 11:18 AM

@Xusheng wrote:

Hi Tom, the log erro shows 'surec1' in list does not match type prescribed. Do you know why does that happen?

Sounds like the variable is character instead of numeric.

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away