BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nici
Obsidian | Level 7

Good evening SAS community,

 

I have a question on how I can modify my code for the following problem:

My data:

 

data have;
input ID Date ret anncmt;
datalines;
1 19990205 0.01 .
1 19990206 0.02 .
1 19990207 0.04 .
1 19990208 0.01 .
1 19990209 0.02 19990209
1 19990210 0.01 .
1 19990211 0.02 .
1 19990212 0.03 .
2 19990205 0.01 .
2 19990206 0.02 .
2 19990207 0.04 .
2 19990208 0.01 .
2 19990209 0.02 19990209
2 19990210 0.01 .
2 19990211 0.02 .
3 19990205 0.01 .
3 19990206 0.02 .
3 19990207 0.04 .
3 19990208 0.01 .
3 19990209 0.02 19990209
3 19990210 0.01 .
3 19990211 0.02 .
3 19990212 0.03 .
;

 

 

What I want to happen now is that SAS creates a new dataset selecting only the 3 observations before and after the anncmt date:

In the moment I'm using the following code which I copied from @s_lassen (https://communities.sas.com/t5/SAS-Programming/Select-n-rows-before-and-after-an-observation/m-p/503...)

 

data want;
set have;
if anncmt>. then do p=max(_n_-3,1) to min(_n_+3,nobs);
  set have point=p nobs=nobs;
  output;
  end;
run;

 

The problem is that the code does not account for the ID. I only want to have the +3/-3 observations of the same ID. The Output of the above code will be like this:


1 19990206 0.02 .
1 19990207 0.04 .
1 19990208 0.01 .
1 19990209 0.02 19990209
1 19990210 0.01 .
1 19990211 0.02 .
1 19990212 0.03 .
2 19990206 0.02 .
2 19990207 0.04 .
2 19990208 0.01 .
2 19990209 0.02 19990209
2 19990210 0.01 .
2 19990211 0.02 .
3 19990205 0.01 .                (Here is the problem!!)

3 19990206 0.02 .
3 19990207 0.04 .
3 19990208 0.01 .
3 19990209 0.02 19990209
3 19990210 0.01 .
3 19990211 0.02 .
3 19990212 0.03 .

 

What I would like to happen is that SAS only selects the +3/-3 observations for the same ID and if there aren't enough observations to fulfill this (like in the data we have in this example) I want SAS to stop when a new ID starts. So the Output should look like this:

 


1 19990206 0.02 .
1 19990207 0.04 .
1 19990208 0.01 .
1 19990209 0.02 19990209
1 19990210 0.01 .
1 19990211 0.02 .
1 19990212 0.03 .
2 19990206 0.02 .
2 19990207 0.04 .
2 19990208 0.01 .
2 19990209 0.02 19990209
2 19990210 0.01 .
2 19990211 0.02 .                         (SAS stops with this obs even though it is only +2 because ID changes in next obs)

3 19990206 0.02 .
3 19990207 0.04 .
3 19990208 0.01 .
3 19990209 0.02 19990209
3 19990210 0.01 .
3 19990211 0.02 .
3 19990212 0.03 .

 

What would be amazing also is if in any way SAS would report when there are not enough obs. for a respective ID to fulfill the +3/-3 and only +2/-3 has been selected.

 

Hopefully I described my problem sufficiently.

 

Thanks to everyone in advance for help.

 

Please bear with me, I'm pretty new to SAS and thus will probably not understand complex code 😕 Therefore an "easy" solution would be appreciated or some short explanations on the code so I at least know what exactly is happening and do not just copy the code.

 

Thanks again and Best regards

Nici

1 ACCEPTED SOLUTION

Accepted Solutions
Criptic
Lapis Lazuli | Level 10

Hey @Nici ,

I hope this works for you:

data want(drop=x y);
	set have;
	x = id;

	if anncmt>. then
		do p=max(_n_-3,1) to min(_n_+3,nobs);
			set have point=p nobs=nobs;

			if x=id then
				output;
			else
				do;
					y = cat("Missing obs for the id: ",x);
					put y;
				end;
		end;
run;

I get the output:

1	19990206	0.02	
1	19990207	0.04	
1	19990208	0.01	
1	19990209	0.02	19990209
1	19990210	0.01	
1	19990211	0.02	
1	19990212	0.03	
2	19990206	0.02	
2	19990207	0.04	
2	19990208	0.01	
2	19990209	0.02	19990209
2	19990210	0.01	
2	19990211	0.02	
3	19990206	0.02	
3	19990207	0.04	
3	19990208	0.01	
3	19990209	0.02	19990209
3	19990210	0.01	
3	19990211	0.02	
3	19990212	0.03	

In the log one Missing obs for the id: 2 is added you could also output these to another table.

 

If you have any question about the coding please let me know.

 

Best regards

Criptic

View solution in original post

10 REPLIES 10
Criptic
Lapis Lazuli | Level 10

Hey @Nici ,

I hope this works for you:

data want(drop=x y);
	set have;
	x = id;

	if anncmt>. then
		do p=max(_n_-3,1) to min(_n_+3,nobs);
			set have point=p nobs=nobs;

			if x=id then
				output;
			else
				do;
					y = cat("Missing obs for the id: ",x);
					put y;
				end;
		end;
run;

I get the output:

1	19990206	0.02	
1	19990207	0.04	
1	19990208	0.01	
1	19990209	0.02	19990209
1	19990210	0.01	
1	19990211	0.02	
1	19990212	0.03	
2	19990206	0.02	
2	19990207	0.04	
2	19990208	0.01	
2	19990209	0.02	19990209
2	19990210	0.01	
2	19990211	0.02	
3	19990206	0.02	
3	19990207	0.04	
3	19990208	0.01	
3	19990209	0.02	19990209
3	19990210	0.01	
3	19990211	0.02	
3	19990212	0.03	

In the log one Missing obs for the id: 2 is added you could also output these to another table.

 

If you have any question about the coding please let me know.

 

Best regards

Criptic

SR_FR
Obsidian | Level 7

another way to obtain your result:

 


data _null_;
   set have;
   by id ;
   if first.id then n=1;
   else n+1;
   if missing(anncmt)=0 then call symputx(compress('id'||id),n);
run;

data want(drop=n);
   set have;
   by id;
   if first.id then n=1;
   else n+1;
   if symget(compress('id'||id))-3<=n<=symget(compress('id'||id))+3 then output;
run;

best regards

 

Sébastien

Nici
Obsidian | Level 7

Thanks a lot for taking the time to look at my problem and coming up with a solution. 🙂

 

I tested your code and it works perfectly as well. Unfortunately I cannot accept two posts as the solution for this problem. 😕

 

BR

Nici

Nici
Obsidian | Level 7

Your code works perfectly and is super intuitive.

Thank you very much! 🙂

 

BR

Nici
Obsidian | Level 7

Hey@Criptic 

 

something changed in my data and now the code you provided does not work anymore 😞

 

My data looks like this now:

 

data cfma.parentstockdata;
input ID Date ret anncmt flag;
datalines;
1 19990205 0.01 19990209 0
1 19990206 0.02 19990209 0
1 19990207 0.04 19990209 0
1 19990208 0.01 19990209 0
1 19990209 0.02 19990209 1
1 19990210 0.01 19990209 0
1 19990211 0.02 19990209 0
1 19990212 0.03 19990209 0
2 19990205 0.01 19990209 0
2 19990206 0.02 19990209 0
2 19990207 0.04 19990209 0
2 19990208 0.01 19990209 0
2 19990209 0.02 19990209 1
2 19990210 0.01 19990209 0
2 19990211 0.02 19990209 0
3 19990205 0.01 19990209 0
3 19990206 0.02 19990209 0
3 19990207 0.04 19990209 0
3 19990208 0.01 19990209 0
3 19990209 0.02 19990209 1
3 19990210 0.01 19990209 0
3 19990211 0.02 19990209 0
3 19990212 0.03 19990209 0
;

 

I adjusted your code a little bit and it now looks like this:

 

data CRhelp5(drop=x y);
set cfma.parentstockdata;
x = ID;

if flag=1 then
do p=max(_n_-5,1) to min(_n_+5,nobs);
set cfma.parentstockdata point=p nobs=nobs;

if x=ID then
output;
else
do;
y = cat("Missing obs for the pcusip: ",x);
put y;
end;
end;
run;

 

I do not really get why it is not working anymore since the only thing that changed in my code is basically the new variable flag. Somehow I do not get an output anymore though. CRHelp5 has 0 observations and my whole log is full of the message "missing obs for the ID ...."

 

Do you have an idea what could be the problem?

 

Thank you very much in advance and have a great evening 🙂

 

BR

Nici

Criptic
Lapis Lazuli | Level 10

Hey @Nici ,

 

does your log contain any Error or Warning Messages? Because the code runs fine for me. All rows are retained because of the increase from 3 to 5 but that looks to be correct and I get the following messages in the log:

 

Missing obs for the pcusip: 1
Missing obs for the pcusip: 1
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 3
NOTE: There were 23 observations read from the data set WORK.PARENTSTOCKDATA.
NOTE: The data set WORK.CRHELP5 has 23 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds
Nici
Obsidian | Level 7

Hey @Criptic ,

 

thank you for your answer. I've found the source of the problem in my dataset. The problem is that the variable 'ID' is not a numeric but a character variable. ID's like "54140Y" are possible. How can I adjust the code so it works for character variables as well?

 

I get the error:

"NOTE: Invalid numeric data, ID='54140Y'...."

 

Thank you very much and have a great day.

 

BR

Nici

Criptic
Lapis Lazuli | Level 10

Hey @Nici ,

 

I tried to replicate your error but couldn't. The code works fine for me even if the variable is character. The variable x (which holds the id) takes on the same format as the ID column.

 

My example code / modified data to replicate different length and character IDs:

data work.parentstockdata;
length ID $2.;
input ID Date ret anncmt flag;
datalines;
12 19990205 0.01 19990209 0
1 19990206 0.02 19990209 0
1F 19990207 0.04 19990209 0
1 19990208 0.01 19990209 0
1F 19990209 0.02 19990209 1
1 19990210 0.01 19990209 0
1 19990211 0.02 19990209 0
1 19990212 0.03 19990209 0
2 19990205 0.01 19990209 0
2 19990206 0.02 19990209 0
2 19990207 0.04 19990209 0
2 19990208 0.01 19990209 0
2 19990209 0.02 19990209 1
2 19990210 0.01 19990209 0
2 19990211 0.02 19990209 0
3 19990205 0.01 19990209 0
3 19990206 0.02 19990209 0
3 19990207 0.04 19990209 0
3 19990208 0.01 19990209 0
3 19990209 0.02 19990209 1
3 19990210 0.01 19990209 0
3 19990211 0.02 19990209 0
3 19990212 0.03 19990209 0
;
run;

data CRhelp5(drop=y);
	set work.parentstockdata;
	x = ID;

	if flag=1 then
		do p=max(_n_-5,1) to min(_n_+5,nobs);
			set work.parentstockdata point=p nobs=nobs;

			if x=ID then
				output;
			else
				do;
					y = cat("Missing obs for the pcusip: ",x);
					put y;
				end;
		end;
run;

In the log I get the following messages:

Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 1F
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 2
Missing obs for the pcusip: 3
NOTE: There were 23 observations read from the data set WORK.PARENTSTOCKDATA.
NOTE: The data set WORK.CRHELP5 has 17 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds

Please check that your ID variable in the dataset parentstockdata is defined as a alphanumerical field.

If this doesn't help you please try to replicate your error with example data and post it.

 

Best regards

Criptic

Nici
Obsidian | Level 7

Hey @Criptic ,

 

I finally got it to work. The problem was totally my fault and I'm super sorry for wasting some of your time. In my dataset I actually already had help variables called "x" and "y". I did not mention them here because as I said they are just help variables. Obviously SAS has troubles if the same variables are used twice...

 

Again, I'm very sorry!

 

Thank you for your on-going support throughout my struggles.

 

Have an amazing day.

 

BR

Nici

 

 

Criptic
Lapis Lazuli | Level 10
No worries, have a great one!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2789 views
  • 6 likes
  • 3 in conversation