Oops, I didn't quite understand the question. Yep, inputs are Arabic words represented as Unicode strings, with or without inflection marks (ex. متوسّط), and outputs are a string consisting of 3 or 4 Arabic letters (وسط) or an empty string for "no root".
No problem. By the way, I added some quick and dirty romanization in the output stage if you're interested. It in no way represents how the words sound, but it does make it easier for the Roman eye to parse.