BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hello Folks,


I'm trying to subset a large data to fourty smaller datasets based on the "location"  AND "code" variables.

Codes are taking different values in datasteps depending on whether sites are A or B or C. 

I'd like to reduce human error by automating this subsetting process.

Is there any way? I'm familiar to macro/mend process. But I don't know how when character variable is involved taking different values. 

 

DATA HAVE; 
INPUT ID CODE $ LOCATION $;
CARDS;
1	154	A
1	157	A
1	C25	A
1	153	A
2	183	B
2	185	B
2	196	B
2	196	B
3       200     C
3       100     C 
;

DATA SUBSET_A(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="A"));
IF CODE IN ('153','154') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('153','154') THEN WEIGHT=1; ELSE
IF CODE IN ('157') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

DATA SUBSET_B(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="B"));
IF CODE IN ('183','185') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('183','185') THEN WEIGHT=1; ELSE
IF CODE IN ('196') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

DATA SUBSET_C(COMPRESS=YES);
SET HAVE(WHERE=(LOCATION="C"));
IF CODE IN ('200') THEN SITE=1; ELSE SITE=0;
IF CODE IN ('200') THEN WEIGHT=1; ELSE
IF CODE IN ('100') THEN WEIGHT=2; ELSE WEIGHT=99;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

May be you can define user defined formats using proc format which would be a one time effort to type carefully. In any case, I really do think that's gonna take some boring effort at least once. 

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

splitting into many data sets based on LOCATION is pretty straight forward  however in your case I wonder how would you automate the condition of the CODE values that constitutes to SITE variables as each of seems to be mutually exclusive 

Cruise
Ammonite | Level 13

That's the problem. I wonder if it's possible to define a specific set of 'codes' for the specific combinations of 'Location-site-weight'? as a dictionary to refer? and SAS subsets data with the reference of info in that dictionary? something like that?

novinosrin
Tourmaline | Level 20

May be you can define user defined formats using proc format which would be a one time effort to type carefully. In any case, I really do think that's gonna take some boring effort at least once. 

Astounding
PROC Star

It's certainly possible to set up the known translations as a data set.  For example:

 

data master_list;
input location $ code $ site weight;
datalines;
A 153 1 1
A 154 1 1
A 157 0 2
B 183 1 1
B 185 1 1
B 196 0 2
C 200 1 1
C 100 0 2
;

Then the approach would be to use this as a hash table, where the key is the combination of CODE and LOCATION, and the data is the SITE and WEIGHT.

 

However, note that the logic is not just a matter of look-up.  You have to check whether the table actually contains the combination you are searching for.  If it does, great you can use the values found via look-up.  But if it doesn't, you still need to set SITE to 0 and WEIGHT to 99.

 

This is something that @novinosrin can do in his sleep, so I will abandon the coding efforts.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1173 views
  • 3 likes
  • 3 in conversation