BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ani7
Obsidian | Level 7

I have the attached excel as an input file to work with. All the variables with want prefixed are the ones that I need to create. Essentially, every time the dist variable crosses an interval of 1.5000, I want to create a flag and start a new group `want_group` in the next row. In the excel, the first time 1.5 is crossed  is at 1.5005 so the next threshold should be 1.5005 + 1.5 = 3.005. So the next flag is at the 3.1001 observation. How would I go about doing this??

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Ani7 

 

Please try this:

data have;
	infile datalines dlm="09"x;
	input time dist want_flag1 want_group1 want_running_total1;
	datalines;
59	1.4999	0	1	1.4999
60	1.5005	1	1	1.5005
61	1.6005	0	2	0.1
62	1.7005	0	2	0.2
63	1.8005	0	2	0.3
64	1.9005	0	2	0.4
65	2.0005	0	2	0.5
66	2.1005	0	2	0.6
67	2.2005	0	2	0.7
68	2.3005	0	2	0.8
69	2.4005	0	2	0.9
70	2.5003	0	2	0.9998
71	2.6001	0	2	1.0996
72	2.6999	0	2	1.1994
73	2.7997	0	2	1.2992
74	2.8995	0	2	1.399
75	3.0003	0	2	1.4998
76	3.1001	1	2	1.5996
77	3.2002	0	3	0.1001
78	3.3003	0	3	0.2002
79	3.4004	0	3	0.3003
;
run;

data want;
	set have;
	want_flag = 0;
	retain want_group 1;
	retain _want_running_total 0;
	retain _dist 0;
	if _N_=1 then _dist=dist;
	if dist > _dist then do;
			want_flag = 1;
			_dist = dist + 1.5;
		end;
	if lag(want_flag) = 1 then want_group + 1;
	if want_flag=1 then do;
			_want_running_total = dist;
			want_running_total = dist;
		end;
	else want_running_total =  dist - _want_running_total;
	drop _:;
run;

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Ani7 

 

Please try this:

data have;
	infile datalines dlm="09"x;
	input time dist want_flag1 want_group1 want_running_total1;
	datalines;
59	1.4999	0	1	1.4999
60	1.5005	1	1	1.5005
61	1.6005	0	2	0.1
62	1.7005	0	2	0.2
63	1.8005	0	2	0.3
64	1.9005	0	2	0.4
65	2.0005	0	2	0.5
66	2.1005	0	2	0.6
67	2.2005	0	2	0.7
68	2.3005	0	2	0.8
69	2.4005	0	2	0.9
70	2.5003	0	2	0.9998
71	2.6001	0	2	1.0996
72	2.6999	0	2	1.1994
73	2.7997	0	2	1.2992
74	2.8995	0	2	1.399
75	3.0003	0	2	1.4998
76	3.1001	1	2	1.5996
77	3.2002	0	3	0.1001
78	3.3003	0	3	0.2002
79	3.4004	0	3	0.3003
;
run;

data want;
	set have;
	want_flag = 0;
	retain want_group 1;
	retain _want_running_total 0;
	retain _dist 0;
	if _N_=1 then _dist=dist;
	if dist > _dist then do;
			want_flag = 1;
			_dist = dist + 1.5;
		end;
	if lag(want_flag) = 1 then want_group + 1;
	if want_flag=1 then do;
			_want_running_total = dist;
			want_running_total = dist;
		end;
	else want_running_total =  dist - _want_running_total;
	drop _:;
run;

Best,

Ani7
Obsidian | Level 7
This was actually the correct solution. I don't know how to unselect the other solution I accepted unfortunately
novinosrin
Tourmaline | Level 20

Hello mam @BeverlyBrown The OP has chosen the incorrect solution. The correct solution happens to be that of @ed_sas_member 's . May i request your time in making that change plz. Thank you in advance  

novinosrin
Tourmaline | Level 20

Hi @Ani7  I am a little way of your GROUP 1's running total though the result will match but doesn't seem to follow a pattern or trend otherwise it's easy

So for what it's worth, here is my stab at it-


data have;
input time	dist;
cards;
59	1.4999
60	1.5005
61	1.6005
62	1.7005
63	1.8005
64	1.9005
65	2.0005
66	2.1005
67	2.2005
68	2.3005
69	2.4005
70	2.5003
71	2.6001
72	2.6999
73	2.7997
74	2.8995
75	3.0003
76	3.1001
77	3.2002
78	3.3003
79	3.4004
;

data want;
 _n_=1.5;
 do until(z);
  set have end=z;
  retain want_grp 1;
  flag=dist>=_n_;
  _iorc_=dif(dist);
  if want_runningtotal=. then want_runningtotal=dist;
  want_runningtotal=sum(_iorc_,want_runningtotal);
  output;
  if dist>=_n_ then do;
   _n_=sum(dist,1.5);
   want_grp+1;
   flag=1;
   want_runningtotal=0;
  end;
 end;
run;

 I have made an EDIT: 

_n_=sum(dist,_n_); 

has been changed to

_n_=sum(dist,1.5); 

Ani7
Obsidian | Level 7
Thanks! That was it
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Additionally your shown values in the "running total" column suffer from either a calculation problem, a rounding problem (different cell display numeric setting) or possibly both, and possibly in more than one column.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1376 views
  • 1 like
  • 4 in conversation