Protecting Your Ruby Source Code for End User Applications
If you want to distribute your Ruby applications while still protecting your intellectual property you could use an obfuscation tools such as ZenObfuscate or try to write your own. But in this article, I’m going to show a different approach that’s been used by several different companies producing commercial products written in Ruby. The method is not specific to Ruby and should work for any interpreted language in which you need to distribute your source code with the application.
The secret is to encrypt your Ruby source, store it in a database and then modify the Ruby interpreter to look for your code in the database and decrypt it on the fly. I should note that there is no way to completely protect a product which is distributed to your customers–with enough diligence any security measures can be broken.
These instructions are for Unix operating systems (include MacOS X). Unfortunately (or fortunately, for me), I don’t own a Windows machine.
If you don’t already have Berkeley DB (BDB) installed on your system you can download it here. You will need to follow the instructions which come with BDB which explain how to build and install it. Originally, I was going to use OpenSSL to encrypt our Ruby source code before inserting it into the database and decrypt it after retrieving it from the database. Thankfully, the latest version of BDB includes AES support which allows you to maintain an encrypted database very easily.
You will also need to obtain a fresh copy of the Ruby source code so that we can build our own private version which knows how to pull classes out of our BDB instance. So let’s start by creating a directory for us to work in:
mkdir -p ~/RubyProject/deploy
You should unpack the Ruby source in the ~/RubyProject directory (this should create directory which looks like ~/RubyProject/ruby-1.8.6-p111). Go into that directory and run the configure script as follows:
./configure --prefix=~/RubyProject/deploy --with-static-linked-ext
This will enable us to build a stand alone Ruby interpreter which has everything it needs statically linked in.
To find out where we need to hook in our handler which loads missing classes from the database, you can grep through the source code looking for const_missing like this:
grep const_missing *.c
This turned up two files object.c and variable.c. Looking at the output, I could see that the const_missing function is actually defined in variable.c. Jumping in, I searched for the code executed when a constant cannot be found. This led me to the rb_const_get_0 function which walks up the class hierarchy looking for the specified constant. If it cannot be found, it returns the result of the const_missing function. So right before that final return, I added the following hook:
/* At this point we haven't found the class so we must look
inside of the BerkeleyDB for it, see dbloader.c */
value = load_from_db(id);
if (value != Qundef) return value;
Now I just had to implement the load_from_db function which should look inside of the encrypted BDB instance for the file. Looking at how the rb_const_get_0 function works, I knew my new method had to return a Ruby VALUE:
VALUE
load_from_db(id)
ID id;
{
DBT k;
DBT d;
memset(&k, 0, sizeof(DBT));
memset(&d, 0, sizeof(DBT));
verify_database_state();
k.data = rb_id2name(id);
k.size = strlen(k.data);
d.flags = DB_DBT_MALLOC;
if (bdb->get(bdb, NULL, &k, &d, 0) == 0) {
rb_eval_string((const char *)d.data);
free(d.data);
return rb_eval_string(rb_id2name(id));
}
return Qundef;
}
You’ll note that if we find the class in the database, we first evaluate it using rb_eval_string. Since this returns nil, we need evaluate the class name so that we can pass back a Ruby VALUE. If the class doesn’t exist in the database, we return Qundef and const_missing gets called as usual.
The verify_database_state function ensures that the password for the BDB instance was passed and the database opened so that the get call can access it. It is implemented like so:
/* a simple way to obfuscate the password in memory */
#define A(c) (c) - 0x1d
#define ENCRYPT_PWD(str) do { char * p = str; while (*p) *p++ -= 0x1d; } while (0)
#define DECRYPT_PWD(str) do { char * p = str; while (*p) *p++ += 0x1d; } while (0)
static void
verify_database_state()
{
/* don't forget to NULL terminate this array! */
static char info[] =
{ A('1'), A('2'), A('3'), A('4'), A('5'),
A('6'), A('7'), A('8'), A('9'), A('0'),
0 };
const char * database_file = "data.db";
if (bdb == NULL) {
db_env_create(&bdbenv, 0);
db_create(&bdb, bdbenv, 0);
DECRYPT_PWD(info);
bdbenv->set_encrypt(bdbenv, info, DB_ENCRYPT_AES);
ENCRYPT_PWD(info);
bdbenv->open(bdbenv, ".", DB_INIT_MPOOL | DB_CREATE | DB_PRIVATE, 0600);
bdb->set_flags(bdb, DB_ENCRYPT);
bdb->open(bdb, NULL, database_file, NULL, DB_BTREE, 0, 0644);
}
}
You’ll notice that the password is hard coded into the binary along with the location of the BDB instance. If you are going use this code, you’ll want to change the default password (which needs to match the password you used when storing your Ruby source in the database) and perhaps the location of the database. To deter those looking to decrypt our Ruby source, I’ve written C macros which performs a transformation on the password. It is fairly simple to circumvent, so you should research other methods of securely storing passwords within applications.
Finally, you’ll need to remember to close the BDB instance before the Ruby interpreter exits. To do this we simply open up main.c and add the following line immediately following the call to ruby_run:
close_database();
The implementation of close_database is trivial, so you can just look in the patch I’ve provided for it.
The patch to the Ruby source code is included in this zipped attachment along with another utility I wrote to load all of your Ruby source into the encrypted BDB instance (along with its Makefile). Here is a transcript which shows how to use the tools and your new Ruby interpreter:
~RubyProject/tools> ./rb_store data.db *.rb
Enter database password:
Stored Example (131) bytes
~RubyProject/tools> ./rb_load data.db
Enter database password:
Hit CTRL+C to quit.
Enter a class name: Example
Found:
class Example
def initialize
puts "Hello from the Example class"
end
def aMethod
puts "Called aMethod"
end
end
Enter a class name: Foo
No such class
Enter a class name: ^C
~RubyProject/ruby-1.8.6-p111> ./ruby -e "e = Example.new"
Hello from the Example class
~RubyProject/ruby-1.8.6-p111> ./ruby -e "f = Foo.new"
-e:1: uninitialized constant Foo (NameError)
This implementation doesn’t deal with multiple classes of the same name residing in different directories. Also, the way in which the rb_load tool determines the name of the class is rather naive and can screw it up. I’ve wanted to write this for over a year now after hearing Rich Kilmer talk about the way in which InfoEther distributes its Ruby applications. So this is more of a proof-of-concept for myself than anything else. Hopefully you learned from this tutorial and can put it to good use.
Ryan Davis said,
Wrote on December 18, 2007 @ 6:25 am
I don’t want to completely poo poo on the idea. I think it is a good one, but in a dynamic language like ruby there is almost ALWAYS a way around things like this. If ruby evals it, I can play with it:
% ./ruby -e “require ‘rubygems’; require ‘ruby2ruby’; puts RubyToRuby.translate(Example)”
class Example < Object
def aMethod
puts(”Called aMethod”)
end
def initialize
puts(”Hello from the Example class”)
end
end
Ryan Davis said,
Wrote on December 18, 2007 @ 6:26 am
argh. stupid stripping of tags. Ruby2Ruby actually formats nicely.
sean said,
Wrote on December 20, 2007 @ 9:35 am
Ryan,
Thanks for the feedback. Usually when someone ships a Ruby application they don’t give the end user direct access to the interpreter. But that would be fairly easy to circumvent as well.
Since you’ve made a product out of ZenObfuscate, do you have any suggestions on how one would prevent ruby2ruby from extracting source code?
Thanks again,
Sean
P.S. I tried responding to you personally via your website (http://zenspider.com/Contact.html) but your form appears to be broken.
Ryan Davis said,
Wrote on December 21, 2007 @ 4:42 am
The point of zenobfuscate is that it translates and compiles your ruby code. No eval. No play.
P.S. I did have to enter my email in this form here… I assume you have access to it.
Richard Lack said,
Wrote on January 29, 2008 @ 12:44 pm
This is really fascinating. I started contemplating how I could do something like this without having to maintain my own modified version of Ruby. t looks like eval will load a big chunk of source code no problem. The difficulty would be preventing some interloper program from posing as Ruby, and slurping the source. So then I started thinking about having a program that would do the loading and decryption, but it would first have to verify the authenticity of the currently executing Ruby interpreter, possibly by checking its hash code. And I’d need to encrypt the source code specifically for the hash code of the currently compiled Ruby anyway… so maybe using your approach isn’t so bad.
But I wonder, is there a practical way to do this without having to keep a custom compiled Ruby interpreter? Perhaps I just need to think of the Ruby interpreter as part of our software package rather than part of the operating system, and it won’t feel wrong then.
Ryan Davis said,
Wrote on January 30, 2008 @ 6:38 am
There is no need for a custom compiled interpreter at all… I don’t know where you got that idea.
Alex said,
Wrote on May 15, 2008 @ 8:33 am
What about a patched database server that is logging everything?
What about a “man in the middle” attack?
Obfuscating data does not make your data vanish.
It just increases the level of complexity for
reading those data a little bit. As long as you still need
to provide the complete de-obfuscation module - you cant win!
Sean said,
Wrote on May 15, 2008 @ 11:23 am
Alex,
Any time you allow your property to be run in the attacker’s environment (their system), they will always find a way around whatever means you use to protect yourself.
The only safe way to protect your source code is to never let it touch a machine that is connected to the outside world.
As I said in the post, this is just a proof of concept, it definitely can be improved. I don’t think anyone has developed an uncircumventable solution for protecting applications delivered to customers.