ChatGPT解决这个技术问题 Extra ChatGPT

Protecting executable from reverse engineering?

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people.

Now this is a new subject to me, and the internet is not really resourceful for prevention against reverse engineering but rather depicts tons of information on how to reverse engineer

Some of the things I've thought of so far are:

Code injection (calling dummy functions before and after actual function calls)

Code obfustication (mangles the disassembly of the binary)

Write my own startup routines (harder for debuggers to bind to) void startup(); int _start() { startup( ); exit (0) } void startup() { /* code here */ }

Runtime check for debuggers (and force exit if detected)

Function trampolines void trampoline(void (*fnptr)(), bool ping = false) { if(ping) fnptr(); else trampoline(fnptr, true); }

Pointless allocations and deallocations (stack changes a lot)

Pointless dummy calls and trampolines (tons of jumping in disassembly output)

Tons of casting (for obfuscated disassembly)

I mean these are some of the things I've thought of but they can all be worked around and or figured out by code analysts given the right time frame. Is there anything else alternative I have?

" however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people." -- good luck with that.
You can make your application hard to reverse engineer. You can't make it impossible, not as long as the other guy has a substantial portion of your bits in their hands. Careful about guaranteeing full security, especially if lives are at stake - you can't deliver.
If your computer can understand the code, so can a person.
Make the code Open Source, and nobody will reverse engineer it.
"Security by obscurity never worked."

A
Amber

but they can all be worked around and or figured out by code analysists given the right time frame.

If you give people a program that they are able to run, then they will also be able to reverse-engineer it given enough time. That is the nature of programs. As soon as the binary is available to someone who wants to decipher it, you cannot prevent eventual reverse-engineering. After all, the computer has to be able to decipher it in order to run it, and a human is simply a slower computer.


+1. Go read about the glory days of copy protection on the Apple II, the ever-escalating war between the obfuscators and the crackers, the crazy tricks with the floppy disk's stepper motor and undocumented 6502 instructions and so on... And then cry yourself to sleep, because you are not going to implement anything nearly so elaborate and they all got cracked eventually.
easier to use a simulator and get better visibility than try to reverse engineer visually or with a disassembler. If the security is not built into the hardware you are using, I think the world is averaging about two days to two weeks to reverse engineer and defeat pretty much everything that comes out. If it takes you more than two days to create and implement this, you have spent too much time.
The only reasonably functioning DRM today is the combination of a key and an internet server verifying that only one instance of the key is active at one timem.
@Rotsor: The computer cannot understand it because we have not been able to reduce this sort of intelligence to an algorithm (yet), not because there is some sort of physical or technological barrier in place. The human can understand it because he can do anything the computer can (albeit slower) as well as reason.
At which point someone will try to reverse-engineer the computer, unless it's only available in an environment you control.
i
ivan_pozdeev

What Amber said is exactly right. You can make reverse engineering harder, but you can never prevent it. You should never trust "security" that relies on the prevention of reverse engineering.

That said, the best anti-reverse-engineering techniques that I've seen focused not on obfuscating the code, but instead on breaking the tools that people usually use to understand how code works. Finding creative ways to break disassemblers, debuggers, etc is both likely to be more effective and also more intellectually satisfying than just generating reams of horrible spaghetti code. This does nothing to block a determined attacker, but it does increase the likelihood that J Random Cracker will wander off and work on something easier instead.


I understand this, and I've read a few papers on Skype security explained and I've been contemplating the same ideas Skype has already tried as a method to not prevent but rather protect my protocol. Something that has proven worthy enough given the obvious circumstances for Skype.
Skype is actually the first example that came to mind, so I'm glad you are already looking into emulating their methods.
R
RyanR

Safe Net Sentinel (formerly Aladdin). Caveats though - their API sucks, documentation sucks, and both of those are great in comparison to their SDK tools.

I've used their hardware protection method (Sentinel HASP HL) for many years. It requires a proprietary USB key fob which acts as the 'license' for the software. Their SDK encrypts and obfuscates your executable & libraries, and allows you to tie different features in your application to features burned into the key. Without a USB key provided and activated by the licensor, the software can not decrypt and hence will not run. The Key even uses a customized USB communication protocol (outside my realm of knowledge, I'm not a device driver guy) to make it difficult to build a virtual key, or tamper with the communication between the runtime wrapper and key. Their SDK is not very developer friendly, and is quite painful to integrate adding protection with an automated build process (but possible).

Before we implemented the HASP HL protection, there were 7 known pirates who had stripped the dotfuscator 'protections' from the product. We added the HASP protection at the same time as a major update to the software, which performs some heavy calculation on video in real time. As best I can tell from profiling and benchmarking, the HASP HL protection only slowed the intensive calculations by about 3%. Since that software was released about 5 years ago, not one new pirate of the product has been found. The software which it protects is in high demand in it's market segment, and the client is aware of several competitors actively trying to reverse engineer (without success so far). We know they have tried to solicit help from a few groups in Russia which advertise a service to break software protection, as numerous posts on various newsgroups and forums have included the newer versions of the protected product.

Recently we tried their software license solution (HASP SL) on a smaller project, which was straightforward enough to get working if you're already familiar with the HL product. It appears to work; there have been no reported piracy incidents, but this product is a lot lower in demand..

Of course, no protection can be perfect. If someone is sufficiently motivated and has serious cash to burn, I'm sure the protections afforded by HASP could be circumvented.


Moderator Note: Comments under this answer have been removed due to digressing into antagonistic noise.
+1 for experience, but I'd like to echo that it's not perfect. Maya (3D suite) used a hardware dongle (not sure if it was HASP), which didn't deter pirates for very long. When there's a will, there's a way.
AutoCAD uses a similar system, which has been cracked numerous times. HASP and others like it will keep honest people honest, and prevent casual piracy. If you're building the next multiple-billion dollar design product, you'll always have crackers to contend with. Its all about diminishing returns - how many hours of effort is it worth to crack your software protection vs just paying for it.
I also want to chime in from the perspective of someone who has used HASP secured software. HASPs are a royal pain in the ass to the end user. I've dealt with a Dallas iButton and an Aladdin HASP, and both were really buggy, and caused the software to randomly stop working, requiring disconnecting and reconnecting the HASP.
Also, It's worth noting that HASP security measures are not necessarily any more secure then code obfuscation - sure they require a different methodology to reverse engineer, but it is very possible to reverse them - See: flylogic.net/blog/?p=14 flylogic.net/blog/?p=16 flylogic.net/blog/?p=11
G
Gilles 'SO- stop being evil'

Making code difficult to reverse-engineer is called code obfuscation.

Most of the techniques you mention are fairly easy to work around. They center on adding some useless code. But useless code is easy to detect and remove, leaving you with a clean program.

For effective obfuscation, you need to make the behavior of your program dependent on the useless bits being executed. For example, rather than doing this:

a = useless_computation();
a = 42;

do this:

a = complicated_computation_that_uses_many_inputs_but_always_returns_42();

Or instead of doing this:

if (running_under_a_debugger()) abort();
a = 42;

Do this (where running_under_a_debugger should not be easily identifiable as a function that tests whether the code is running under a debugger — it should mix useful computations with debugger detection):

a = 42 - running_under_a_debugger();

Effective obfuscation isn't something you can do purely at the compilation stage. Whatever the compiler can do, a decompiler can do. Sure, you can increase the burden on the decompilers, but it's not going to go far. Effective obfuscation techniques, inasmuch as they exist, involve writing obfuscated source from day 1. Make your code self-modifying. Litter your code with computed jumps, derived from a large number of inputs. For example, instead of a simple call

some_function();

do this, where you happen to know the exact expected layout of the bits in some_data_structure:

goto (md5sum(&some_data_structure, 42) & 0xffffffff) + MAGIC_CONSTANT;

If you're serious about obfuscation, add several months to your planning; obfuscation doesn't come cheap. And do consider that by far the best way to avoid people reverse-engineering your code is to make it useless so that they don't bother. It's a simple economic consideration: they will reverse-engineer if the value to them is greater than the cost; but raising their cost also raises your cost a lot, so try lowering the value to them.

Now that I've told you that obfuscation is hard and expensive, I'm going to tell you it's not for you anyway. You write

current protocol I've been working on must not ever be inspected or understandable, for the security of various people

That raises a red flag. It's security by obscurity, which has a very poor record. If the security of the protocol depends on people not knowing the protocol, you've lost already.

Recommended reading:

The security bible: Security Engineering by Ross Anderson

The obfuscation bible: Surreptitious software by Christian Collberg and Jasvir Nagra


@Gilles, that's your statement, which is very strong, so the burden of proof lies on you. However, I will provide a simple example: 2+2 can be simplified by the compiler to 4, but the decompiler can't bring it back to 2+2 (what if it actually was 1+3?).
@Rotsor 4 and 2+2 are observationally equivalent, so they are the same for this purpose, namely to figure out what the program is doing. Yes, of course, the decompiler can't reconstruct the source code, but that's irrelevant. This Q&A is about reconstructing the behavior (i.e. the algorithm, and more precisely a protocol).
You don't have to do anything to reconstruct the behaviour. You already have the program! What you usually need is to understand the protocol and change something in it (like replacing a 2 in 2+2 with 3, or replace the + with a *).
If you consider all behaviourally-equivalent programs the same, then yes, the compiler can't do anything because it performs just an indentity transformation. The decompiler is useless then too, as it is an identity transformation again. If you don't, however, then 2+2 -> 4 is a valid example of unreversible transformation performed by the compiler. Whether it makes understanding more easy or more difficult is a separate argument.
@Gilles I can't extend your analogy with apple because I can't imagine a structurally different, but behaviourally equvalent apple. :)
O
Oded

The best anti disassembler tricks, in particular on variable word length instruction sets are in assembler/machine code, not C. For example

CLC
BCC over
.byte 0x09
over:

The disassembler has to resolve the problem that a branch destination is the second byte in a multi byte instruction. An instruction set simulator will have no problem though. Branching to computed addresses, which you can cause from C, also make the disassembly difficult to impossible. Instruction set simulator will have no problem with it. Using a simulator to sort out branch destinations for you can aid the disassembly process. Compiled code is relatively clean and easy for a disassembler. So I think some assembly is required.

I think it was near the beginning of Michael Abrash's Zen of Assembly Language where he showed a simple anti disassembler and anti-debugger trick. The 8088/6 had a prefetch queue what you did was have an instruction that modified the next instruction or a couple ahead. If single stepping then you executed the modified instruction, if your instruction set simulator did not simulate the hardware completely, you executed the modified instruction. On real hardware running normally the real instruction would already be in the queue and the modified memory location wouldnt cause any damage so long as you didnt execute that string of instructions again. You could probably still use a trick like this today as pipelined processors fetch the next instruction. Or if you know that the hardware has a separate instruction and data cache you can modify a number of bytes ahead if you align this code in the cache line properly, the modified byte will not be written through the instruction cache but the data cache, and an instruction set simulator that did not have proper cache simulators would fail to execute properly. I think software only solutions are not going to get you very far.

The above are old and well known, I dont know enough about the current tools to know if they already work around such things. The self modifying code can/will trip up the debugger, but the human can/will narrow in on the problem and then see the self modifying code and work around it.

It used to be that the hackers would take about 18 months to work something out, dvds for example. Now they are averaging around 2 days to 2 weeks (if motivated) (blue ray, iphones, etc). That means to me if I spend more than a few days on security, I am likely wasting my time. The only real security you will get is through hardware (for example your instructions are encrypted and only the processor core well inside the chip decrypts just before execution, in a way that it cannot expose the decrypted instructions). That might buy you months instead of days.

Also, read Kevin Mitnick's book The Art of Deception. A person like that could pick up a phone and have you or a coworker hand out the secrets to the system thinking it is a manager or another coworker or hardware engineer in another part of the company. And your security is blown. Security is not all about managing the technology, gotta manage the humans too.


Also, you don't have to have access to the source code (or even disassembled source code) to find a security hole. It could be by accident, or by using the fact that most holes come from the same problems in the code (like buffer overflows).
There are big problems with self-modifying code. Most modern OS/hardware will not let you do it without very high privilege, there can be cache issues and the code is not thread-safe.
With modern x86 processors, tricks like these are often bad for performance. Using the same memory location as part of more than one instruction likely has an effect similar to a mispredicted branch. Self-modifying code causes the processor to discard cache lines to maintain coherence between the instruction and data caches (if you execute the modified code much more often than you modify it, it may still be a win).
I ran into this 20 years ago. Took us almost half an hour to figure out what happened. Not very good if you need longer protection.
"the real instruction would already be in the queue and the modified memory location wouldnt cause any damage" Until an interrupt occurs in between, flushing the instruction pipeline, and causing the new code to become visible. Now your obfuscation has caused a bug for your legitimate users.
P
Phil

Take, for example, the AES algorithm. It's a very, very public algorithm, and it is VERY secure. Why? Two reasons: It's been reviewed by lots of smart people, and the "secret" part is not the algorithm itself - the secret part is the key which is one of the inputs to the algorithm. It's a much better approach to design your protocol with a generated "secret" that is outside your code, rather than to make the code itself secret. The code can always be interpreted no matter what you do, and (ideally) the generated secret can only be jeopardized by a massive brute force approach or through theft.

I think an interesting question is "Why do you want to obfuscate your code?" You want to make it hard for attackers to crack your algorithms? To make it harder for them to find exploitable bugs in your code? You wouldn't need to obfuscate code if the code were uncrackable in the first place. The root of the problem is crackable software. Fix the root of your problem, don't just obfuscate it.

Also, the more confusing you make your code, the harder it will be for YOU to find security bugs. Yes, it will be hard for hackers, but you need to find bugs too. Code should be easy to maintain years from now, and even well-written clear code can be difficult to maintain. Don't make it worse.


+1 for common sense: why make it harder for yourself when you could just design a better system.
As I always say, if you keep everything server side, its more secure
i
iammilind

Many a times, fear of your product getting reverse engineered is misplaced. Yes, it can get reverse engineered; but will it become so famous over a short period of time, that hackers will find it worth to reverse engg. it ? (this job is not a small time activity, for substantial lines of code).

If it really becomes a money earner, then you should have gathered enough money to protect it using the legal ways like, patent and/or copyrights.

IMHO, take the basic precautions you are going to take and release it. If it becomes a point of reverse engineering that means you have done a really good job, you yourself will find better ways to overcome it. Good luck.


I mean, this is a viable and applicable answer, but the line you draw between protection and earing an income of a couple million to have others protect your product for you is a really long line.
a
asmeurer

Take a read of http://en.wikipedia.org/wiki/Security_by_obscurity#Arguments_against. I'm sure others could probably also give a better sources of why security by obscurity is a bad thing.

It should be entirely possible, using modern cryptographic techniques, to have your system be open (I'm not saying it should be open, just that it could be), and still have total security, so long as the cryptographic algorithm doesn't have a hole in it (not likely if you choose a good one), your private keys/passwords remain private, and you don't have security holes in your code (this is what you should be worrying about).


I would agree with this. I think you may have a conceptual or a design problem. Is there an analog with a private-public key pair solution? You never divulge the private key, it stays with the owner whose secure client processes it. Can you keep the secure code off their computer and only pass results back to the user?
t
tne

Since July 2013, there is renewed interest in cryptographically robust obfuscation (in the form of Indistinguishability Obfuscation) which seems to have spurred from original research from Amit Sahai.

Sahai, Garg, Gentry, Halevi, Raykova, Waters, Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits (July 21, 2013).

Sahai, Waters, How to Use Indistinguishability Obfuscation: Deniable Encryption, and More.

Sahai, Barak, Garg, Kalai, Paneth, Protecting Obfuscation Against Algebraic Attacks (February 4, 2014).

You can find some distilled information in this Quanta Magazine article and in that IEEE Spectrum article.

Currently the amount of resources required to make use of this technique make it impractical, but AFAICT the consensus is rather optimistic about the future.

I say this very casually, but to everyone who's used to instinctively dismiss obfuscation technology -- this is different. If it's proven to be truly working and made practical, this is major indeed, and not just for obfuscation.


N
Norman Ramsey

To inform yourself, read the academic literature on code obfuscation. Christian Collberg of the University of Arizona is a reputable scholar in this field; Salil Vadhan of Harvard University has also done some good work.

I'm behind on this literature, but the essential idea I'm aware of is that you can't prevent an attacker from seeing the code that you will execute, but you can surround it with code that is not executed, and it costs an attacker exponential time (using best known techniques) to discover which fragments of your code are executed and which are not.


B
Brian Makin

If someone wants to spend the time to reverse your binary then there is absolutely nothing you can do to stop them. You can make if moderately more difficult but that's about it. If you really want to learn about this then get a copy of http://www.hex-rays.com/idapro/ and disassemble a few binaries.

The fact that the CPU needs to execute the code is your undoing. The CPU only executes machine code... and programmers can read machine code.

That being said... you likely have a different issue which can be solved another way. What are you trying to protect? Depending on your issue you can likely use encryption to protect your product.


B
Black

To be able to select the right option, You should think of the following aspects:

Is it likely that "new users" do not want to pay but use Your software? Is it likely that existing customers need more licences than they have? How much are potential users willing to pay? Do You want to give licences per user / concurrent users / workstation / company? Does Your software need training / customization to be useful?

If the answer to question 5 is "yes", then do not worry about illegal copies. They wouldn't be useful anyway.

If the answer to question 1 is "yes", then first think about pricing (see question 3).

If You answer questions 2 "yes", then a "pay per use" model might be appropriate for You.

From my experience, pay per use + customization and training is the best protection for Your software, because:

New users are attracted by the pricing model (little use -> little pay)

There are almost no "anonymous users", because they need training and customization.

No software restrictions scares potential customers away.

There is a continuous stream of money from existing customers.

You get valuable feedback for development from Your customers, because of a long-term business relationship.

Before You think of introducing DRM or obfuscation, You might think of these points and if they are applicable to Your software.


Very good advice (and I upvoted it), but it doesn't really adress this particular question
M
Mohammad Alaggan

There is a recent paper called "Program obfuscation and one-time programs". If you are really serious about protecting your application. The paper in general goes around the theoretical impossibility results by the use of simple and universal hardware.

If you cant afford requiring extra hardware, then there is also another paper that gives the theoretically best-possible obfuscation "On best-possible obfuscation", amongst all programs with the same functionality and same size. However the paper shows that information-theoretic best-possible implies a collapse of the polynomial hierarchy.

Those papers should at least give you sufficient bibliographical leads to walk in the related literature if these results does not work for your needs.

Update: A new notion of obfuscation, called indistinguishable obfuscation, can mitigate the impossibility result (paper)


S
SSpoke

Protected code in a virtual machine seemed impossible to reverse engineer at first. Themida Packer

But it's not that secure anymore.. And no matter how you pack your code you can always do a memory dump of any loaded executable and disassemble it with any disassembler like IDA Pro.

IDA Pro also comes with a nifty assembly code to C source code transformer although the generated code will look more like a pointer/address mathematical mess.. if you compare it with original you can fix all errors and rip anything out.


L
Lukasz Madon

No dice, you cannot protect your code from disassemble. What you can do is to set up the server for the business logic and use webservice to provide it for your app. Of course, this scenario is not always possible.


well said, the only way to avoid people disassembling your code is to never let them have physical access to it at all which means offering your application exclusively as a SAAS, taking requests from remote clients and handing back the processed data. Place the server in a locked room in an underground bunker surrounded by an alligator ditch and 5m tall electrified razor wire to which you throw away the key before covering it all by 10m of reinforced concrete, and then hope you didn't forget to install tons of software systems to prevent intrusion over the network.
I hope I never get the contract to maintain your servers
G
Gallium Nitride

To avoid reverse engineering, you must not give the code to users. That said, I recommend using an online application...however (since you gave no context) that could be pointless on yours.


this is the real solution ... namely put your crown jewels into your own server up on your own VPS machine and only expose API calls into this server from the client ( browser or api client )
A
Aaron Mason

Possibly your best alternative is still using virtualization, which introduces another level of indirection/obfuscation needed to by bypassed, but as SSpoke said in his answer, this technique is also not 100% secure.

The point is you won't get ultimate protection, because there is no such thing, and if ever will be, it won't last long, which mean it wasn't ultimate protection in the first place.

Whatever man assemble, can be disassembled.

It's usually true that (proper) disassembling is often (a bit or more) harder task, so your opponent must be more skilled, but you can assume that there is always someone of such quality, and it's a safe bet.

If you want to protect something against REs, you must know at least common techniques used by REs.

Thus words

internet is not really resourceful for prevention against reverse engineering but rather depicts tons of information on how to reverse engineer

show bad attitude of yours. I'm not saying that to use or embed protection you must know how to break it, but to use it wisely you should know its weaknesses and pitfalls. You should understand it.

(There are examples of software using protection in a wrong way, making such protection practically nonexistent. To avoid speaking vaguely I'll give you an example briefly described in internet: Oxford English Dictionary Second Edition on CD-ROM v4. You can read about its failed use of SecuROM in following page: Oxford English Dictionary (OED) on CD-ROM in a 16-, 32-, or 64-bit Windows environment: Hard-disk installation, bugs, word processing macros, networking, fonts, and so forth)

Everything takes time.

If you're new to the subject and don't have months or rather years to get properly into RE stuff, then go with available solutions made by others. The problem here is obvious, they are already there, so you already know they're not 100% secure, but making your own new protection would give you only a false sense of being protected, unless you know really well state of the art in reverse engineering and protection (but you don't, at least at this moment).

The point of software protection is to scare newbies, stall common REs, and put a smile on the face of seasoned RE after her/his (hopefully interesting) journey to the center of your application.

In business talk you may say it's all about delaying competition, as much as it is possible.

(Have a look at nice presentation Silver Needle in the Skype by Philippe Biondi and Fabrice Desclaux shown on Black Hat 2006).

You're aware that there is a lot of stuff about RE out there, so start reading it. :)

I said about virtualization, so I'll give you a link to one exemplary thread from EXETOOLS FORUM: Best software protector: Themida or Enigma Protector?. It may help you a bit in further searches.


R
Rotsor

Contrary to what most people say, based on their intuition and personal experience, I don't think cryptographically-safe program obfuscation is proven to be impossible in general.

This is one example of a perfectly obfuscated program statement to demonstrate my point:

printf("1677741794\n");

One can never guess that what it really does is

printf("%d\n", 0xBAADF00D ^ 0xDEADBEEF);

There is an interesting paper on this subject, which proves some impossibility results. It is called "On the (Im)possibility of Obfuscating Programs".

Although the paper does prove that the obfuscation making the program non-distinguishable from the function it implements is impossible, obfuscation defined in some weaker way may still be possible!


1. Your example is not relevant here; the two programs you show are behaviorly equivalent, and this question is about figuring out the behavior of a program, not reconstructing its source code (which, obviously, is impossible). 2. This paper is a theoretical paper; it's impossible to write the perfect obfuscator, but it's also impossible to write the perfect decompiler (for much the same reasons that it's impossible to write the perfect program analyser). In practice, it's an arms race: who can write the better (de)obfuscator.
@Gilles, the result of the (correct) deobfuscation will always be behaviourally equivalent to the obfuscated code. I don't see how that undermines the importance of the problem.
Also, about arms race: this is not about who invests more into the research, but rather about who is right. Correct mathematical proofs don't go wrong just because someone wants them to really badly.
Okay, maybe you are right about arms race in practice. I think I misunderstood this one. :) I hope some kind of cryptographically-safe obfuscation is possible though.
For an interesting case of obfuscation, try smart cards, where the problem is that the attacker has physical access (white-box obfuscation). Part of the response is to limit access by physical means (the attacker can't read secret keys directly); but software obfuscation plays a role too, mainly to make attacks like DPA not give useful results. I don't have a good reference to offer, sorry. The examples in my answer are vaguely inspired from techniques used in that domain.
O
Olof Forshell

I do not think that any code is unhackable but the rewards need to be great for someone to want to attempt it.

Having said that there are things you should do such as:

Use the highest optimization level possible (reverse engineering is not only about getting the assembly sequence, it is also about understanding the code and porting it into a higher-level language such as C). Highly optimized code can be a b---h to follow.

Make structures dense by not having larger data types than necessary. Rearrange structure members between official code releases. Rearranged bit fields in structures are also something you can use.

You can check for the presence of certain values which shouldn't be changed (a copyright message is an example). If a byte vector contains "vwxyz" you can have another byte vector containing "abcde" and compare the differences. The function doing it should not be passed pointers to the vectors but use external pointers defined in other modules as (pseudo-C code) "char *p1=&string1[539];" and "char p2=&string2[-11731];". That way there won't be any pointers pointing exactly at the two strings. In the comparison code you then compare for "(p1-539+i)-*(p2+11731+i)==some value". The cracker will think it is safe to change string1 because no one appears to reference it. Bury the test in some unexpected place.

Try to hack the assembly code yourself to see what is easy and what is difficult to do. Ideas should pop up that you can experiment with to make the code more difficult to reverse engineer and to make debugging it more difficult.


your first point makes no sense, optimized code cuts out cruft, this makes it easier to reverse (I speak from experience). your thrid point is also a waste of time, and reverse engineer worth his salt know how to do memory access breakpointing. this is why is probably best to not design a system yourself but us 3rd party libraries that have yet to be 'cracked', becuase thats likely to last a little longer than anything a 'rookie' could create...
Since it appears I don't know anything on the subject matter perhaps I should to turn to a professional such as you for my software development needs instead of writing any code myself.
f
flolo

As many already said: On a regular CPU you cant stop them from doing, you can just delay them. As my old crypto teacher told me: You dont need perfect encryption, breaking the code must be just more expensive than the gain. Same holds for your obfuscation.

But 3 additional notes:

It is possible to make reverse engineering impossible, BUT (and this is a very very big but), you cant do it on a conventional cpu. I did also much hardware development, and often FPGA are used. E.g. the Virtex 5 FX have a PowerPC CPU on them, and you can use the APU to implement own CPU opcodes in your hardware. You could use this facility to really decrypt incstuctions for the PowerPC, that is not accessible by the outside or other software, or even execute the command in the hardware. As the FPGA has builtin AES encryption for its configuration bitstream, you could not reverse engineer it (except someone manages to break AES, but then I guess we have other problems...). This ways vendors of hardware IP also protect their work. You speak from protocol. You dont say what kind of protocol it is, but when it is a network protocol you should at least protect it against network sniffing. This can you indeed do by encryption. But if you want to protect the en/decryption from an owner of the software, you are back to the obfuscation. Do make your programm undebuggable/unrunnable. Try to use some kind of detection of debugging and apply it e.g. in some formula oder adding a debug register content to a magic constant. It is much harder if your program looks in debug mode is if it where running normal, but makes a complete wrong computation, operation, or some other. E.g. I know some eco games, that had a really nasty copy-protection (I know you dont want copyprotection, but it is similar): The stolen version altered the mined resources after 30 mins of game play, and suddenly you got just a single resource. The pirate just cracked it (i.e. reverse engineered it) - checked if it run, and volia released it. Such slight behaviour changings are very hard to detect, esp. if they do not appear instantly to detection, but only delayed.

So finally I would suggest: Estimate what is the gain of the people reverse engineering your software, translate this into some time (e.g. by using the cheapest indian salary) and make the reverse engineering so time costing that it is bigger.


I
Ira Baxter

Traditional reverse engineering techniques depend on the ability of a smart agent using a disassembler to answer questions about the code. If you want strong safety, you have do to things that provably prevent the agent from getting such answers.

You can do that by relying on the Halting Program ("does program X halt?") which in general cannot be solved. Adding programs that are difficult to reason about to your program, makes your program difficult to reason about. It is easier to construct such programs than to tear them apart. You can also add code to program that has varying degrees of difficulty for reasoning; a great candidate is the program of reasoning about aliases ("pointers").

Collberg et al have a paper ("Manufacturing Cheap, Resilient and Stealthy Opaque Constructs") that discusses these topics and defines a variety of "opaque" predicates that can make it very difficult to reason about code:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.1946&rep=rep1&type=pdf

I have not seen Collberg's specific methods applied to production code, especially not C or C++ source code.

The DashO Java obfuscator seems to use similar ideas. http://www.cs.arizona.edu/~collberg/Teaching/620/2008/Assignments/tools/DashO/


A
Albert van der Horst

Security through obscurity doesn't work as has been demonstrated by people much cleverer than the both of us. If you must protect the communication protocol of your customers then you are morally obliged to use the best code that is in the open and fully scrutinized by experts.

This is for the situation where people can inspect the code. If your application is to run on an embedded microprocessor, you can choose one that has a sealing facility, which makes it impossible to inspect the code or observe more than trivial parameters like current usage while it runs. (It is, except by hardware invading techniques, where you carefully dismantle the chip and use advanced equipment to inspect currents on individual transistors.)

I'm the author of a reverse engineering assembler for the x86. If you're ready for a cold surprise, send me the result of your best efforts. (Contact me through my websites.) Few I have seen in the answers would present a substantial hurdle to me. If you want to see how sophisticated reverse engineering code works, you should really study websites with reverse engineering challenges.

Your question could use some clarification. How do you expect to keep a protocol secret if the computer code is amenable to reverse engineering? If my protocol would be to send an RSA encrypted message (even public key) what do you gain by keeping the protocol secret? For all practical purposes an inspector would be confronted with a sequence of random bits.

Groetjes Albert


M
MerchantProtocol.com

FIRST THING TO REMEMBER ABOUT HIDING YOUR CODE: Not all of your code needs to be hidden.

THE END GOAL: My end goal for most software programs is the ability to sell different licenses that will turn on and off specific features within my programs.

BEST TECHNIQUE: I find that building in a system of hooks and filters like WordPress offers, is the absolute best method when trying to confuse your opponents. This allows you to encrypt certain trigger associations without actually encrypting the code.

The reason that you do this, is because you'll want to encrypt the least amount of code possible.

KNOW YOUR CRACKERS: Know this: The main reason for cracking code is not because of malicious distribution of licensing, it's actually because NEED to change your code and they don't really NEED to distribute free copies.

GETTING STARTED: Set aside the small amount of code that you're going to encrypt, the rest of the code should try and be crammed into ONE file to increase complexity and understanding.

PREPARING TO ENCRYPT: You're going to be encrypting in layers with my system, it's also going to be a very complex procedure so build another program that will be responsible for the encryption process.

STEP ONE: Obfuscate using base64 names for everything. Once done, base64 the obfuscated code and save it into a temporary file that will later be used to decrypt and run this code. Make sense?

I'll repeat since you'll be doing this again and again. You're going to create a base64 string and save it into another file as a variable that will be decrypted and rendered.

STEP TWO: You're going to read in this temporary file as a string and obfuscate it, then base64 it and save it into a second temp file that will be used to decrypt and render it for the end user.

STEP THREE: Repeat step two as many times as you would like. Once you have this working properly without decrypt errors, then you're going to want to start building in land mines for your opponents.

LAND MINE ONE: You're going to want to keep the fact that you're being notified an absolute secret. So build in a cracker attempt security warning mail system for layer 2. This will be fired letting you know the specifics about your opponent if anything is to go wrong.

LAND MINE TWO: Dependencies. You don't want your opponent to be able to run layer one, without layer 3 or 4 or 5, or even the actual program it was designed for. So make sure that within layer one you include some sort of kill script that will activate if the program isn't present, or the other layers.

I'm sure you can come up with you're own landmines, have fun with it.

THING TO REMEMBER: You can actually encrypt your code instead of base64'ing it. That way a simple base64 willnt decrypt the program.

REWARD: Keep in mind that this can actually be a symbiotic relationship between you and you'r opponent. I always place a comment inside of layer one, the comment congratulates the cracker and gives them a promo code to use in order to receive a cash reward from you.

Make the cash reward significant with no prejudice involved. I normally say something like $500. If your guy is the first to crack the code, then pay him his money and become his friend. If he's a friend of yours he's not going to distribute your software. Ask him how he did it and how you can improve!

GOOD LUCK!


Did you even read the question? I never asked for methods on how to protect from piracy. The application will be free, it's the underlying protocol used that needs to be protected due to the nature of security.
A
AareP

Have anyone tried CodeMorth: http://www.sourceformat.com/code-obfuscator.htm ? Or Themida: http://www.oreans.com/themida_features.php ?

Later one looks more promissing.


The first thing i would recommend, is to avoid at all costs the use of commercial obfuscators! Because if you crack the obfuscator, you can crack all applications obfuscated with it!