sbf

About

sbf is a Unix-style command-line utility for the brainfuck programming language. It is meant to provide compatability with many different brainfuck implementations across many platforms. It is free and open-source, licensed under the GNU General Public License.

More detailed information is listed by subject:

What is sbf?

sbf stands for Sbf BrainFuck. It is a brainfuck interpreter, compiler, and translator.

What is brainfuck? http://en.wikipedia.org/wiki/brainfuck
brainfuck is an esoteric programming language. It is one of the smallest Turing-complete programming languages, with only 8 commands.

brainfuck was originally created by Urban Müller as a language with a compiler of less than 200 bytes (the compiler was written in Amiga assembly). Since then, many compilers and interpreters have been written for brainfuck in many languages and for many platforms. Often times, when they need to resolve a critical issue (what to do if someone reads a character after EOF), they resolve it in whatever way is the default, which lead to a proliferation of different ideas on how exactly brainfuck should be implemented. Also, because brainfuck is so easy to implement, many people would write a program, put it on the internet, and never touch it again. I've downloaded countless programs that probably have simple to fix errors, but I'm too lazy to struggle through their source code and figure it out, I can't contact the creator, and they left no documentation about how it works or how to fix it.

sbf seeks to solve that. It has an interpreter, compiler, and translator to a variety of languages (assembly, C, Lisp, Perl) all rolled into one. Sure, it's not less than 200 bytes, but it's portable, it has all brainfuck utilities you'll ever need (it can even count how many of each brainfuck command you use), and it has documentation. Plus, it doesn't gloss over issues like EOF or EOL - it explicitly converts between ASCII 10 and \n on platforms like Windows that have more complicated line endings, and it allows users to select different EOF behaviors depending on their program, so a program that works with no change on EOF can be used with sbf just as easily as one that requires returing 0 on EOF or returing -1 on EOF.

Top

Why sbf?

sbf, as I said, stands for Sbf BrainFuck. By tradition, the name of the brainfuck language should be spelled in lowercase, even at the begining of a sentence, so I adopted the same convention for the name of this program.

The acronym is, of course, recursive, like many acronyms in the open-source community - GNU's Not Unix (makers of much open-source software), Wine Is Not an Emulator (popular Windows emulator for Linux), PNG's Not Gif (picture format) and PHP: Hypertext Preprocessor (webpage scripting language used for this site) being some of the more well known ones. Okay, originally the 's' stood for 'super'. Anyone who has stumbled across the brainfuck language has found a plethora of programs freely avaliable on the Internet to interpret, compile, or translate brainfuck using varying standards, downloaded them, and tried to use them, only to discover that they didn't work. Furthermore, the author provided no documentation and you don't know C/Java/Perl/Python/whatever language they decided to write it in. So I decided one day to sit down and write one program that would interpret, compile, and translate brainfuck, would actually work, and wouldn't put any limits on how it worked. To that end, sbf was born.

Later, I decided "Super BrainFuck" sounded kind of lame, so I just made it a recursive acronym. I think "sbf" sounds kind of nice, don't you?

Top

What is brainfuck?

I already showed you the Wikipedia article. Jeez.

brainfuck is an esoteric programming language. Basically, it was created as a joke. It was meant to be one of the smallest, most minimalist languages ever, yet to still be Turing-complete. I won't delve into much detail about it's history or what it means to be Turing-complete - you can Wiki surf to your heart's content on that one.

brainfuck has an imaginary list. This list, let's assume here, is infinite, limited only by your computer's avaliable memory (many brainfuck implementations put boundaries on this list simply because it is easier, but sbf allows a full, infinite list in both directions). Each element of this list is a single byte, capable of holding values ranging, obviously, from 0 to 255 (or -128 to 127, but it's semantics), but, at the start of execution, all set to 0.

It also has a pointer. This pointer begins pointing to one specific byte - it doesn't matter which one because the list is infinite. However, the only byte a brainfuck command can affect is the one the pointer points to. Fear not, however, for a brainfuck command can also move the pointer to the next byte over.

brainfuck has 8 commands. They are all 1 character and take no arguments. They are:

< move the pointer one byte to the left
> move the pointer one byte to the right
+ increment the byte under the pointer by 1
- decrement the byte under the pointer by 1
. output to standard output the ASCII character of the byte under the pointer
, get 1 byte of input from standard input and store that value in the byte under the pointer
[ if the byte under the pointer is 0, skip to the corresponding ]
] if the byte under the pointer is not 0, skip to the corresponding [

It may also be useful to think about it in terms of the corresponding C commands, assuming we have a char * ptr initialized to 0:

< ptr--;
> ptr++;
+ (*ptr)++;
- (*ptr)--;
. putchar(*ptr);
, *ptr = getchar();
[ while(*ptr) {
] }

The weird looping structures of [ and ] seem perfectly normal when thought of in terms of C.

Now, I know not everyone knows C, so here it is in Perl and Python:

Assuming a $lst[9000] and a $ptr set to any number in Perl
or a lst=[0,0,0...0,0] and a ptr=0 in Python

Perl: Python:
< $ptr--; ptr--
> $ptr++; ptr++
+ $lst[$ptr]++; lst[ptr]++
- $lst[$ptr]--; lst[ptr]--
. print chr $lst[$ptr]; sys.stdout.write(lst[ptr])
, $lst[$ptr] = ord(getc); lst[ptr] = ord(sys.stdin.read)
[ while($lst[$ptr]){ while lst[ptr]: # indent more
] } # none, just indent less

Aren't I nice?

Top

Why would anyone write in brainfuck?

Read the Story of Mel for a deeper understanding of why someone would want to use a language as clumsy and hideous as brainfuck.

A lot of tasks in brainfuck are nearly impossible for those of average concentration. Printing any pre-defined text is insanely complex, and God help anyone who wants to try to process regular expressions in brainfuck. But some things are rather uniquely simple. One can make a "cat" program in brainfuck like this:

,[.[-],]

Sure, that looks incomprehensible to anyone just starting out in brainfuck, but that's how brainfuck was intended to look - incomprehensible. Once you get used to brainfuck, it's hard to remember that something that looks that simple - it's only an 8 byte program, after all - reads in a text file from standard in and writes it to standard out.

But this isn't a Python tutorial. I'm not here to convince you that brainfuck is the messiah come to save you from the sins of "There's More Than One Way To Do It." (To any Python fans, I mean it all in good fun - forgive me, for my first language was Perl.) I'll be honest. brainfuck isn't for everyone. It's like Plan 9 from Outer Space: Not everyone gets why it's so funny. If you're one of those who get brainfuck and like the joke, this is for you. For those of you who don't, we can still be friends.

And just as a side note, Plan 9 from Outer Space is hilarious.

Top

An sbf tutorial

sbf provides a number of features for using brainfuck.

It is a command-line utility, as has been stated before, so you usually invoke it at the terminal. You'll probably try something like this first:

$ sbf

Or, if you didn't install it:

$ ./sbf

You'll get mad crazy errors, like this:

Usage: sbf [options]
Type 'sbf -h' for more documentation.

It's simpler than it looks, really - the -h is called a command-line flag. sbf is a command-line utility, like the "ls" or "cat" or "gcc" programs you may use in a Unix shell environment. The flags tell it what to do. Fortunately it has default behaviors, so we don't have to tell it what to do always.

Let's take this example. I'm assuming you have some basic knowledge of the bash shell and can understand what I've done at the command line here.

$ cat > cat.bf
,[.[-],]
^D
$ sbf cat.bf < cat.bf
,[.[-],]
$

First I used "cat" to create a brainfuck program called "cat.bf" (.bf or .b is the usual extension for brainfuck programs. I use .bf because I started out on .pl rather than .c, and because .b is more ambiguous. .bf most clearly means brainfuck), not because I like using cat as a text editor, but because that way you could see the file. The brainfuck code is this:

,[.[-],]

That reads a byte, then starts a loop. The loop prints the last byte read, then, with [-], resets that byte to 0 (it subtracts 1 from the byte each time it loops, and it loops until the byte is 0), and then ask for another byte.

This program is meant to have it's standard input redirected, since it's basically a "cat" program. So we encounter the biggest issue of brainfuck - EOF. Due to brainfucks roots as "smallest compiler possible," people writing brainfuck implementations tend to make the comma's EOF behavior simply the default for whatever language they're using, to reduce program size. This hampers portability, and has given rise to 3 different solutions:

* Return -1 on EOF.
* Return 0 on EOF.
* Make no change on EOF.

sbf, because it approaches brainfuck from the angle of portability, supports all three behaviors, as long as you take the time to specify which one you want. But, if you don't take the time to specify, it uses option #3 - no change to the byte on EOF. So, with the default EOF behavior, we can see that, if our "cat.bf" program hits EOF, then no change is made to the byte - meaning it is left at 0, the loop will exit, and the program will end. The "cat.bf" program looks different for different EOF behaviors, but we won't get into that now.

So, after we were done writing our brainfuck program, we ran it, with no flags, and simply the name of our new brainfuck program as an argument. Additionally, we redirected standard input to come from the file "cat.bf" (i.e. the program). By default, sbf assumes that you're reading brainfuck from a file (and whines when you don't give it a file, giving us the usage message), and assumes you're going to interpret the code in that file. So it interprets "cat.bf", which, as I just expained, prints its standard input to its standard output until EOF. So immediately after running sbf, we see the contents of cat.bf - our redirected standard input.

Of course, you don't have to redirect standard input. Not all brainfuck programs require (or are even suited for) redirection of standard input. This is just an example.

Top

sbf's features

A full listing of sbf's command-line flags, organized by purpose:

Input options: >
-f [file] > read brainfuck code from [file]
-p > brainfuck interactive prompt
Midput options: >
-a > ASCII output (default)
-b > binary mode (no EOL translation)
-e [val] > , returns [val] on EOF (-1, 0, X - no change)
-k > keep original brainfuck source (for -t)
-n > numeric output
-o > optimize
-r > raw (unbuffered) input
-s ([size]) > simple mode (finite array of size [size] or 30000)
-x > hex output
-8 > octal output
Output options: >
-c [file] > compile to executable [file]
-d > delete comment characters
-i > interpret brainfuck
-l > count brainfuck commands
-t [file] [lang] > translate brainfuck to [lang] in [file]
> ([lang] can be found from [file] extention if not given)
Misc. options: >
-h > display help message
-v > display version, system, and compiler info

For now, all flags for sbf are case-insensitive. I'm trying to keep the interface all case-insensitive, but if I have to double up on a letter, keep in mind that, for now, the preferred case is lowercase, and the most important flags will be kept lowercase. Also, there is no support curently for long words like --interp. Any sort of string handling is just too much work for me to do in C. I understand the concept, and the implementation would be tedious at best, so if I'm going to write that kind of interface I'll write it in Perl where it's bearable. Perhaps, though, one day I'll get up and write a full-word interface.

One thing that it is important to note - flags that take arguments gobble up the argument immediately next to them, like this:

$ sbf -f cat.bf -t cat.c C -e X

This invokes sbf on the file "cat.bf", translating it to "cat.c" in the C language and using no change on EOF (the default, but we're being explicit for no reason here as an example).

Another thing about the flags is that they can be grouped. Since they're all one letter, instead of writing -f -t -e one could write -fte. This might bring up questions of where to put the arguments, and the answer is that each letter gobbles up following arguments. A full example:

$ sbf -fte cat.bf cat.c C X

Note that if we had specified the flags in a different order, their arguments would also have to be in a different order:

$ sbf -etf X cat.c C cat.bf

I personally prefer to do it this way, but some may find that syntax to be confusing and decide they want to separate the flags for added clarity.

If no input flag is provided, sbf will assume that any non-flag argument that was unused by any flag is the input file, as we saw in our first example where we invoked sbf with only one argument. This means that the following are equivalent:

$ sbf -etf X cat.c C cat.bf

$ sbf -et X cat.c C cat.bf

$ sbf cat.bf -te cat.c C X

$ sbf cat.bf -et X cat.c

Though sbf is not given the -f flag, it automatically defaults to reading from a file, and picks the last unused argument in the list. Be careful about this.

Top

Interactive Mode

sbf's interactive mode is much like an REPL in Lisp, or the interactive modes for languages like Python and Ruby. It displays a prompt, like this:

bf>

And at this prompt, you enter a line of brainfuck code. This code is sent to the interpreter and interpreted. When it's done, you write more code and send it. The state of the brainfuck list/tape is preserved as long as the interactive mode is open, allowing you to piece together someone else's brainfuck program by entering it line-by-line (or command-by-command if you're that anal) and observing the tape afterwards. After every entry, it will print the numeric value of the current list element, so you can enter multiple lines of just < and > to scroll through the list and see what's there.

To quit the interactive mode, use the ^ character. Unlike regular brainfuck commands, the ^ is NOT part of the language - I just check to see if the string contains any ^'s and, if so, it quits. You CAN NOT put a ^ inside a loop or create any sort of conditional exit - if I see a ^ at all, interactive mode exits.

Note that interactive mode will also exit if EOF is encountered while getting input. If EOF is encountered while reading user input to brainfuck via the , command, interactive mode will NOT exit. Only if EOF is entered at the bf> prompt will it end interactive mode.

If your system has access to the GNU Readline or BSD Editline libraries, sbf will probably be compiled with these libraries. In this event, interactive mode will use these libraries for line entering - the left and right arrows will allow you to scroll back and forth through the brainfuck code you've entered at the prompt, and the up and down arrows will scroll through a history of past entries. This is very useful and I recommend that everyone try to compile sbf with one of these two libraries (they both work the same).

If neither library is found, sbf will use a similar function written that is full ANSI C, but does not support the use of the arrow keys for their normal functions. Once again, if you can get away with it I highly recommend using the GNU Readline library or equivalent BSD Editline library, rather than the less-sophisticated ANSI version.

Top

Simple Mode

Simple mode is simple - unlike the normal interpreter, which uses a doubly-linked list as the brainfuck list/tape, simple mode uses a more traditional array of limited size. Normally, you have this:

struct node {
struct node *last;
unsigned char val;
struct node *next;
}

In simple mode, you have this:

unsigned char *bf = (unsigned char*)malloc(size*sizeof(unsigned char));

This is how simple mode is implemented in C. Simple mode, since it is designed to be simpler and show people unfamilliar with the complexities of a given language approximately how brainfuck translates to that language, does no bounds-checking, and much of it appears very intuitive. This is in contrast to the full interpreter/translator, which, due to its features and assurances, will look much more confusing.

Simple mode does no bounds checking. If you try to work with data outside the limited array range, you will get a bus error or segfault and sbf will simply stop working (hopefully - I believe some Windows systems allow you to wander off into uncharted memory, which would be disastrous). Simple mode is not only simpler for new programmers to learn, it can also be used to test compatability with more traditional, limited-array-length brainfuck implementations. It's one thing to write a great brainfuck program. It's another to write a portable one.

However, due to its implementation, simple mode has somewhat "soft boundaries." There are 4 brainfuck commands that, if used, will trigger a bounds error: +-., The other four commands, ><[], do not appear to crash sbf when you wander out of bounds and use them. This means that, regardless of your simple mode's boundries, in sbf, you are guaranteed to be able to wander as far to either end as you like - as long as you come back within bounds before continuing. This is not what most brainfuck implementations do, I am certain, and will be a bit odd. But it does allow you to safely find then beginning of a string with [<]>, even if this string begins at the beginning of the list and this function technically goes out of bounds - even though it gous out of bounds, it goes there with the "soft" functions that don't mind the boundaries, rather than the "hard" functions that crash. By the time it has a chance to use the "hard" functions, it's back within the list.

This method of implementation, with "soft boundries," is unique to C. Simple mode in other languages does not work like this. Perl is the oddity - if you go to the left edge, it will silently extend the list by one. I'm unsure of its behaviour if you try to go further than 1 element at a time. But on the right end, when $ptr becomes -1, we discover that @lst[-1] is the last element of the array. Perl's "simple" mode has a semi-circular list, rolled up all weird. It's just like Perl to do something funky and abnormal.

Python, as expected, freaks out and dies as soon as you try to access anything outside the array limits.

Ruby's input function appears not to work in simple mode, which is a hinderance to testing the array limits.

C and C++'s simple mode translations, unlike sbf's built-in simple mode, do not use malloc() to create an array at runtime, but have the array size specified at compile-time. This behavior is probably undefined and very bad. Never use a simple mode C/C++ translation that risks going out of bounds. I can't guarantee it will run.

Top

Unicode

sbf 0.9.4 fully supports the -u flag. The -u flag tells sbf to encode brainfuck characters with UTF-8 instead of in traditional ASCII. The original brainfuck implementation from Urban Müller uses ASCII, but it was created over 15 years ago now, and standards have changed. When brainfuck was created, ASCII was the most portable solution. Now, Unicode exists, and brainfuck could easily use UTF-8 without breaking backwards compatability or breaking significantly from the original Müller program, which encodes brainfuck characters as 8 bits even though ASCII only uses 7.

sbf currently supports Unicode in interpreting, interactive, compiling, and translation to C, Perl, Python, and Ruby. These are implemented in different ways - some people may find that C's support for Unicode is different than Perl's, Python's or Ruby's. To clarify why support may be different for different languages, the reasons for this, and possible fixes in the future, read on.

Perl has built-in full support for Unicode. For a Perl program to print Unicode characters instead of ASCII characters, all that is necessary is to add the following line to the beginning of a Perl program:

binmode(STDOUT, ":utf8");

We've simply changed STDOUT to use UTF-8, the ideal brainfuck Unicode encoding, rather than whatever character set it usually uses.

In Python and Ruby, it is a bit more difficult - the actual equivalent command for brainfuck's . must be changed. In Python, doing this:

sys.stdout.write(unichr(lst[ptr]).encode("utf-8"))

instead of the normal:

sys.stdout.write(chr(lst[ptr]))

does the trick. In Ruby, it's a bit simpler. Instead of the usual:

print @lst[@ptr].chr

We make @lst[@ptr] a 1-element list and use .pack:

print [@lst[@ptr]].pack("U")

All of these tricks work fine - there is no need to worry about many problems Unicode programmers usually have to worry about because we are dealing with printing and manipulating single characters rather than full strings. We can't get caught up in messing up our string length, because internally it's all stored as a number from 0 to 255.

C support for Unicode, be it the C translation, the compile function, or sbf's own interpreting capabilities, is a bit different. It uses the <locale.h> header to change the locale if we select Unicode. So let's see the locale changing line of code:

setlocale(LC_ALL, "");

Huh. Unlike Perl and Python, there's no "Uni" or "UTF" anywhere. There's not even a "U" lke in Ruby's version. Nothing to indicate Unicode or UTF-8. Why? Because it technically doesn't use UTF-8. The ANSI standard says that C should support two encodings: C and POSIX (which are usually just the same). C programs, by default, are in the C encoding, which uses ASCII and prints annoying question marks for characters over 127. But if you use setlocale and call the locale "" (an empty string), C will set its locale to the environment's locale. On Mac OS X and, hopefully, many Unix-like systems, this will be something like "en_US.UTF-8". However, on, say, Windows, it may be something lame like "en_US.ISO8859-1" or some other ISO standard. It may even be DOS-ASCII. Even on some Linuxes, it may be an ISO standard. I could do this:

setlocale(LC_ALL, "en_US.UTF-8");

This would guarantee that, no matter what, the -u flag will print UTF-8 Unicode. It would also guarantee that sbf would crash on operating systems that don't support Unicode. And as much as I think Unicode should be used if possible, I don't want to screw over people on platforms without full Unicode support (Microsoft, I'm glaring at you). It would be bad if sbf just crashed because your system has no Unicode support, so I think it's safer to use the default encodding. Besides, if it's ISO-8859-1, it's probably the same as the UTF-8 you would see anyway.

Another problem with specifying "en_US.UTF-8" explicitly is for people running computers localized to other languages in Unicode (if you're reading this, this doesn't apply to you. :) ) Those upper 128 characters will be vital to your programs producing any output anyone in your locale will care about, and your system's support for en_US.UTF-8 may be as bad as a non-Unicode system. I'd hate to break your system or my program, so I think it's still valid UTF-8 if I go along wiht your non-English UTF-8 encoding just to be nice.

If it's absolutely vital that you see UTF-8 as widely accepted by us Americans, go through the code and change setlocale(LC_ALL, ""); to setlocale(LC_ALL, "en_US.UTF-8");. Just be aware that these changes may not be supported if you upgrade sbf to a new version, and that you shouldn't do this without first checking your locales to see what you're using and what you have avaliable, and that I may improve on all this mess later by setting to "en_US.UTF-8" and, only if that fails, setting to the environment locale. But who knows? All of this is off in the future. For now, I recommend you just let it use your environment locale.

And lastly, I bet some of you are curious about what exactly your environemnt locale is. On Unix/Linux/Mac OS X/*BSD/etc., open the Terminal and run the locale program:

$ locale

This will tell you what your current locale is. On other platforms, I can write a C program to tell you your locale:

#include <stdio.h>
#include <locale.h>

int main()
{
char *locale = setlocale(LC_ALL, "");
printf("%s\n", locale);
return 0;
}

Try that out for size. This code is tested on Mac OS X and it works. It's not the best code - don't learn about locales from it, because if you called setlocale() again the value of *locale would change or other bad thigns would happen, but it at least tells you your current locale and it shouldn't break your computer. You might want to free(locale); at the end or something, but it probably doesn't matter that much. The point is that it works.

The second part of C's Unicode support is simply changing putchar(); to printf("%lc"); and making sure all char's fed to the printf() are unsigned. I just reimplemented the regular brainfuck lists as unsigned chars because it makes no difference.

I hope that answers any questions you might have had about "Why doesn't the -u flag print Unicode on system X?" If you want to implement a version of sbf that checks for "en_US.UTF-8" before setting to the environment, great. It shouldn't be that hard, I'm just lazy and I just finished doing a lot of work on Unicode, so I don't feel like adding more.

Summary: sbf's interpreting, interactive mode, compiling, and C translations have C-style "UTF-8" support, which may or may not actually be UTF-8, and any issues with their UTF-8 support are somewhat expected, if regrettable, and may be fixed in later releases. sbf's Perl, Python, and Ruby translations have full UTF-8 support, and any issues with their UTF-8 support should be taken up with the Perl/Python/Ruby development team. Not really, it's probably not their fault. I won't kill you if you send me a Perl/Python/Ruby translation bug report.

Top

Supported Language Translations

sbf supports translating brainfuck to the following languages fully:

* C
* C++
* Perl
* Python
* Ruby

sbf supports translating brainfuck to the following language, but an infinite list does not yet work:

* x86 Assembly (NASM syntax)

sbf supports partial translations to the following languages (meaning there are errors with input/output or worse):

* Java
(input is broken, finite list, everything else works)
* Lisp
(no idea - weird errors with sbcl when I try to compile it)

Languages not fully supported are disabled unless the macro EXPERIMENTAL is defined. Type make experimental instead of regular make to define EXPERIMENTAL. Otherwise, sbf WILL NOT be able to translate, even partially, to assembly/Java/Lisp (it won't even recognize the languages - how rude).

Because sbf's translated languages are meant for human viewing, and not just for feeding into another program afterwards, there are no plans to support translation to Gas syntax for x86 - NASM runs on basically everything. For those of you unfortunate enough not to be using NASM, FASM support may be added, with GNU Assembler support coming in at a distant third if that's what's needed. There are no plans to support MASM assembly or any other proprietary assembler. HLA Assembly syntax is under consideration, but would be listed as another "language" rather than replacing any other assembly syntax.

Ideally, sbf will detect which architecture it's on, then detect what kind of assembler is present. If it finds NASM, it will use NASM syntax. If it finds no NASM but finds some other assembler, it will use that syntax. If it finds no assemblers it can use, or is on an architecture for which assembly is unsupported (sorry, PowerPC users), it will display an error message if you try to translate to assembly. If assembly is supported, sbf's compiler flag will translate to assembly, then run the assembler. If assembly is not supported, sbf's compiler flag will translate to C and run the compiler that was used to compile sbf.

More ideally, various (open-source) assembly syntaxes (or just machine code) would be supported for most (major) architectures, and the compiler would never have to translate to C when compiling brainfuck.

Note that, of the fully supported languages, only C++ currently does NOT support Unicode output. Boo C++. The other 4 languages are all fully capable of Unicode output if the -u flag is supplied.

Top

Home

News

About

Download

Manpages

Contact

SF.net

About

What is sbf?

Why sbf?

What is brainfuck?

Why would anyone write in brainfuck?

An sbf tutorial

sbf's features

Interactive Mode

Simple Mode

Unicode

Supported Language Translations

	Hosted at SourceForge.net	Powered by PHP