Hacker News new | past | comments | ask | show | jobs | submit login
YAPF – A formatter for Python files (github.com/google)
142 points by sudmishra on April 2, 2015 | hide | past | favorite | 61 comments



  --style STYLE         specify formatting style: either a style name (for
                        example "pep8" or "google"), or the name of a file
                        with style settings. pep8 is the default.
> pep8 is the default.

I can breathe again.


I don't think there was any other option. If the default wasn't pep8, the tool would have no chance of seeing wide adoption in the Python community, unless someone forked it and changed the default to pep8.


"Don't be evil" still sometimes holds true. :)


I'm sure there are some cases where this is useful, but I'm not really sold. The pitch at the beginning is "satisfying PEP 8 doesn't mean it looks good!" But then it seems most of the changes cleaned up in the code (indentation level, indents on split lines, spacing between operators) are just PEP 8 violations. I think you could find a more convincing demonstration.

The warning about choking on large data literals (which are probably one of the places where prettifying would be most useful, at least to me) also seems ominous.

edit: for my personal use, I tend to use flycheck with flake8 in emacs. This keeps me honest. Is the primary use case for something like this cleaning your own code or other people's?


yapf starts making tons of sense after you've used similar tools in other languages. The chief two examples are "go fmt" and clang-format. The former goes without saying - formatting is enforced in Go.

The latter is widely used to format C, C++ (and even JS and Java) code. Many projects these days require all code to be clang-format'ed and even enforce it in commit hooks and such.

After working on such projects for a while, you get so used to the convenience of an auto-formatting tool, that you want it in all languages you code in. It's extremely useful to be able to just type a bunch of code quickly into the editor, copy-pasting, renaming, and know that the tool will nicely format it for you. Another advantage is that the whole code in a projects starts being and looking very consistent, which is important. Holy wars about brace placement and indentation just disappear.

So yapf is essentially that, for Python.


FWIW, at Yelp we built and open sourced pre-commit (http://pre-commit.com/) that runs a bunch of hooks on git commits. One of the most commonly used hooks is `autopep8 -i` which will automatically formats code on commit.

As yapf documentation has pointed out, pep8 is vague about certain things (i.e. indention). Yelp prefers an indention style where multiple arguments are broken up 1 per line and has a pre-commit hook to automatically enforce that. yapf is also opinionated about indentation, preferring vertical alignment. I've run it on an open source project to illustrated the differences (YelpDent vs yapf indent):

     def remove_user_data(dryrun=False):
         if platform.system() == 'Darwin':
    -        data_home = os.path.join(
    -            os.path.expanduser('~'),
    -            'Library',
    -            'autojump')
    +        data_home = os.path.join(os.path.expanduser('~'), 'Library',
    +                                 'autojump')
        elif platform.system() == 'Windows':
            data_home = os.path.join(os.getenv('APPDATA'), 'autojump')
        else:
    -        data_home = os.getenv(
    -            'XDG_DATA_HOME',
    -            os.path.join(
    -                os.path.expanduser('~'),
    -                '.local',
    -                'share',
    -                'autojump'))
    +        data_home = os.getenv('XDG_DATA_HOME',
    +                              os.path.join(os.path.expanduser('~'), '.local',
    +                                           'share', 'autojump'))


We're actually working on a heuristic that can signal yapf to break arguments to one per line, and do the same for data structures (think a dict initialization).

Ultimately, we'd like to have these things configurable via the style settings of yapf, so multiple detailed styles can be selected.

Patches welcome :)


If you'd like my suggestion. I've been quite annoyed with some of the "odd" or "non-consistent" ways for splitting long lines onto multiple lines (either on definitions, or plain calls to functions). Eventually, I did come up with a consistent way to do so. Have a look at below:

    def OtherLongFunctionName(One,
      Two,
      Three
    ):
        print "Does stuff"
    
    def MyFunction(Param1,
      OtherParam,
      FooParam,
      EtcParam
    ):
        if ((Param1 * 1000) > 5000 &&
          OtherParam is not None &&
          len(FooParam) > 50
        ):
            new_instance_dict = {
              "TestingFoo": Param1,
              "NewFoo": OtherParam,
              "LongFooBar": FooParam,
            }
    
        recursive = MyFunction(
          OtherLongFunctionName(
            Param1,
            OtherParam,
            FooParam
          )
        )
It doesn't comply to PEP8 due to the indentation of nested params and stuff on the new lines. PEP8 apparently wants them aligned with the opening/starting bracket. This looks horrible, and just right-aligns all your code into little columns. And doesn't work very well with nested function calls.


Please don't make it configurable, because it will mean there is no longer "a" YAPF standard. Also look at Pycharm reformat tool, it would be a shame if both couldn't be used in the same versionned project.


The point is there are things that don't violate pep8 that are still atrocious

for example:

    some_function(long_variable_name_and_stuff, long_variable_name_and_stuff,
                                  long_variable_name_and_stuff)
The lower indentation level doesnt actually matter at all says flake8

edit: To below, that's possible. I can't remember the exact case right now but there are things along these lines that are valid and shouldn't be. Multiple possible indentation formats is not necessarily something you want to allow as an org, which is where something like YAPF comes into play


I wonder if that's a configuration thing. Mine gives me a "continuation line over-indented for visual indent" here.


I'm afraid you are mistaken here. PEP 8 reads "Arguments on first line forbidden when not using vertical alignment." flake8 does report this, but if it did not, that would simply be a bug in flake8.

edit: if there are bugs in PEP 8, work to get them fixed rather than promoting alternative styles like the Google style, please!


The warning about data literals is just to warn people that what comes out of YAPF may not look like they desire. The problem is that formatting data literals is hard. Even harder than formatting executable code. This is because there are so many different styles that could be applied to them. There's no "one size fits all" solution.

Therefore, YAPF (and even clang-format) balk on them. They will handle them of course. But the result will almost certainly not look like what you want.

That's why we give you the ability to disable formatting for sections of code.


I do this too but I really don't like the noticeable lag that gets introduced as flycheck shells out to python and parses my file every time I change something.


Are you sure the lag is because of Python? I'd check flymake options first, it may wait for some time after you stop typing and only then start linting. This delay is a trade-off: too short and you're going to have currently edited line marked with red most of the time, too long and you need to wait to see if everything is alright.


Lazy HN for the win! Thanks for the suggestion.


>YAPF is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.


Sounds like one of those "20%" kind of project and Google GitHubers just happened to have the permission to release this project under google/ wow ...


True, but that category also includes protobufs, LevelDB, Snappy, Gumbo, Guice, re2, gtest, and Angular (initially; it soon became an official thing when it got popular). While the "official" open-source projects are things like Chrome, Android, GWT, Go, Dart, gRPC, and Bazel.

Paul Graham once said that you should use the tools that programmers build to solve their own problems, not the tools that big corporations build to solve their ideas of what other peoples' problems are. That doesn't mean every little library is a good one (Sturgeon's Law applies - 90% of everything is crap), but it does mean that being "official" is usually a negative signal on product quality.

This applies to other companies' code as well; I've never used a buggy piece of shit quite so bad as Sun's JSF (which was supposed to be the "official" way to build webapps with Java, circa 2004-2006), while BSD and Linux continue to be great pieces of software 25 years later.


I've been using YAPF for a while and I really like the idea. Be aware that there are still some nasty bugs that haven't been fixed yet so be careful if you use it for something important.

Do anyone of you guys know what the name YAPF comes from? (The only thing I can think of is "Yet Another Python Formatter", but I'm just guessing)


Indeed, it's just "Yet Another Python Formatter"


I've got a python project where I've been pretty naughty with long lines containing sql queries.

I found that running this tool over it resulted in some strange new lines.

For example, this was my old hideously long line:

query_qual = "INSERT OR REPLACE INTO Qualifying (poleid, seasonid, race_week_num, carclassid, pole) values (%s%s, %s, %s, %s, %s)" % (race[u'seasonid'], race[u'race_week_num'], race[u'seasonid'], race[u'race_week_num'], race[u'carclassid'], pole)

Changed to this:

query_qual = "INSERT OR REPLACE INTO Qualifying (poleid, seasonid, race_week_num, carclassid, pole) values (%s%s, %s, %s, %s, %s)" % ( race[u'seasonid'], race[u'race_week_num'], race[ u'seasonid' ], race[u'race_week_num'], race[u'carclassid'], pole)


To my eye, HN has rendered them identically. You can preserve literal whitespace on HN by indenting by two spaces (and a newline to separate the code formatting from ordinary text).

  it appears     like  this


newlines need to be escaped somehow (two newlines seems to do it), otherwise they are combined into the same line:

Every word in this sentence was entered on a different line.


You could try set the line length penalty so something really high. That would probably sort things out.


Feel free to submit a Github issue for yapf


Off-topic, but why is no space before opening parenthesis the norm in programming ("class foo(object):") when normal English usage is to have one? I know myself as the only one who puts this space.


Because the convention in mathematics, from whence the function call syntax used in programming (from which many of the other programming uses then derive), is no space.


While I take your answer, in mathematics, people often use shorter variable names like y(x). Meaningful names given to functions make them read a lot more like English than mathematics.


Though mathematics has also longer names like arcsin, arctan, etc.


I think of it as indicating the function name and parameter list are a unit (it's foo(object), not foo (object)). I've always liked that style.

When I see people write

    'some %s %s text ' %('x', 'y')
my first thought is "% is not not a function".

I'm reminded also of (and agree with) the Crockford convention in JavaScript recommending a space in

    function () {...}
but not in

    function foo(object) {...}
But conventions are are a choice made; most times it's easier to adopt them and move on.


>> But conventions are are a choice made; most times it's easier to adopt them and move on.

Which brings me to a related question. Just like color-schemes are preferences in editors and IDEs, why isn't such spacing too. Then it would not be a "choice made" a priori for all individuals.


Because colors can be altered in any end-reader's IDE/text editor. Indentation and spacing, however, can not. At least, not safely yet.

I suppose you could run all the code you receive and have to read through a code-formatter such as YAPF. That way, no matter what "way" someone writes their code, when you read it, it'll be in your comfortable/preferred format.


Colour schemes aren't part of the text itself. However, if your IDE could automatically display your code formatted as you like while keeping the underlying text file aligned with some convention, and make that not a nightmare while editing...well, I think you're onto something there.


I think it's slightly easier for both humans and computers to parse a function call when there's no space.

I find your question interesting - why SHOULD programming language syntax match English syntax? Programming languages borrow symbols from natural languages for convenience, but there's generally only a very loose analogy between their purposes in each.

Question, does it bug you when you see function calls without spaces in the same way it would if(for example) someone left out a space in writing?


Programming language syntax does not need to match English syntax, but given that it is just a benign convention (that for almost all programming languages would not change the meaning of the program), I certainly expected a non-zero probability of others using it too. Which is why I was curious about the history of the convention since the probability distribution is nearly 100% skewed towards no-space. :-)

To answer your second question, with English, it certainly bugs me. With programming languages, I use the space but of course do not expect the same from anyone else. No one so far has complained about me using it either. :-)


This is just speculation, but since as you rightly point out the space is optional in many programming languages, I suspect the existence of the no-space convention simply reflects that a majority of programmers, over the last however many decades, have preferred to read and write code that way.

There are two factors involved: I reckon most people (a) see a function call as a single logical unit and write it as such and (b) feel compelled to follow conventions. The result is that once there's a clear majority, most code will converge on that style. Clearly you do neither :)

Personally I just try to adopt the conventions of the language or ecosystem I'm working in as closely as possible and try not to let my personal leanings get in the way. The most important thing is that others can easily read my code.

I would love to write Python like this, but sadly for me PEP8 and society frown upon it:

  some_dict = {'a_key':       an_expression,
               'another_key': another_expression}


>"I would love to write Python like this, but sadly for me PEP8 and society frown upon it:"

Indeed, that is one of those "big argument" type of minor formatting discussions. Despite my OCD, it doesn't bother me, and in fact, I prefer formatting them as below:

    some_dict = {
      'a_key': an_expression,
      'another_key': another_expression,
    }
Question, if you add another entry, and it throws off the formatting, do you go up and change all the other entries prior so they all align? Because that does bring up questions about code-commits. I.e. you're touching code that hasn't changed.

    some_dict = {
      'a_key': an_expression,
      'another_key': another_expression,
      'a_really_long_key_that_throws_you_off': expression,
    }
As opposed to:

    some_dict = {'a_key':                                 an_expression,
                 'another_key':                           another_expression,
                 'a_really_long_key_that_throws_you_off': expression}


I expect that's why the current convention was adopted, and another reason I'm happy to do it that way. I don't ever actually format my code like my example - but if I'd learned Python from first principles in a cave, I probably would. :)

Would be cool if IDEs could do this for you at the display level without touching the underlying code!


This passed HN duplicate detector, despite same URL and submission title.

https://news.ycombinator.com/item?id=9260786


Is it possible to YAPF it in Sublime?


For sublime 2 yes:

https://github.com/jason-kane/PyYapf

It isn't in Sublime package control yet.


We need to have plugins for all editors. I'm not familiar with Sublime. But I would welcome people submitting plugins for it. :-)


There seems to be a problem parsing backslash escapes:

echo 'foo="\\"' | PYTHONPATH=/tmp/yapf/yapf python /tmp/yapf

Raises a lib2to3.pgen2.parse.ParseError.

Also crashes on \n, but not, strangely, on \t.

I’m guessing that the backslashes get interpreted twice, somehow.


Are you sure this is not a shell artifact? Does the same problem happen when there's a \\ in a file? In any case, feel free to open a Github issue for yapf, we'll take a look


Yes, it was when parsing a file I encountered it. The one-liner shell command is just a minimal repro of the bug.

Apparently I need to “Sign in” to report a Github issue, so I won’t.


Sorry, I can't reproduce this. I'm trying on this file: https://gist.github.com/eliben/9727758f4847d2e7d86e

And yapf runs just fine on it. Does this file work for you?


Nope.

  Traceback (most recent call last):
    File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
      "__main__", fname, loader, pkg_name)
    File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
      exec code in run_globals
    File "/tmp/yapf/yapf/__main__.py", line 18, in <module>
      sys.exit(yapf.main(sys.argv))
    File "/tmp/yapf/yapf/__init__.py", line 104, in main
      verify=args.verify))
    File "/tmp/yapf/yapf/yapflib/yapf_api.py", line 99, in FormatCode
      tree = pytree_utils.ParseCodeToTree(unformatted_source.rstrip() + '\n')
    File "/tmp/yapf/yapf/yapflib/pytree_utils.py", line 100, in ParseCodeToTree
      tree = parser_driver.parse_string(code, debug=False)
    File "/usr/lib/python2.7/lib2to3/pgen2/driver.py", line 106, in parse_string
      return self.parse_tokens(tokens, debug)
    File "/usr/lib/python2.7/lib2to3/pgen2/driver.py", line 71, in parse_tokens
      if p.addtoken(type, value, (prefix, start)):
    File "/usr/lib/python2.7/lib2to3/pgen2/parse.py", line 116, in addtoken
      ilabel = self.classify(type, value, context)
    File "/usr/lib/python2.7/lib2to3/pgen2/parse.py", line 172, in classify
      raise ParseError("bad token", type, value, context)
  lib2to3.pgen2.parse.ParseError: bad token: type=55, value=u' ', context=('', (1, 5))


Interesting. Can you cite the exact version of Python 2.7 you're using?

Mine is 2.7.6

I'm asking because bugs get fixed in lib2to3 occasionally, and we see differences between minor Python versions in this regard.


Python 2.7.3, the normal one from the current Debian stable.


The time diff between 2.7.3 and 2.7.6 is a year and a half, and lib2to3 got a bunch of fixed merge in during that period. I peeked at the CPython logs, and there's a number of changes so this bug may very well have been plugged.

I guess we should be recommending the latest Python 2.7 possible, since it's very difficult for us to work around lib2to3 bugs in yapf - we're basically limited to whatever it can parse. Some things can be monkey-patched, but core parsing bugs are challenging.


I definitely would like to see configuration options for the vertical alignment of arguments. If the arguments don't fit within the line length then having the option to put them on separate lines (still pep8) would be nice.


> YAPF is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.

Just in case anyone thought "by Google" meant "a Google product" like me.


Interesting to see the formatter is written using 2 space indentation when both pep8 and google's style guides say 4.


I think the original Python Google Style guide did use 2 spaces. https://code.google.com/p/soc/wiki/PythonStyleGuide#Indentat...

And it seems like that's used in YAPF as well: https://github.com/google/yapf/blob/master/yapf/yapflib/styl...


This was submitted not long ago, and that was pointed out there too.

https://news.ycombinator.com/item?id=9260786

I suggested to the author that they restore "google" style to the public one and rename the existing one to "chromium" style, since that's the most prominent public project still on Google's internal style.


i use autopep8 and haven't been thrilled. it gets the job done, but i'll switch in a heartbeat if this is better. we'll see.


YAPF is still brand spanking new, so please use it and report any bugs you find.

We mention in the documentation that data literals are a sticking point to most people and most automatic formatters. Even clang-format tends to balk on them. We try our best at them, but it might be best to just disable formatting them if they are already formatted to something you like. :-)


yapf's approach is, philosiphically, different from autopep8. yapf doesn't just fix pep8 violations. It takes a look at your whole code, and reformats it to a canonical form. If you've ever used "go fmt" or "clang-format" (for C/C++), this is the same idea


Also see go fmt


Thought experiment: wouldn't it be just awesome if Google decided to promote an alternative to the format used by go fmt, so that newbies could learn that instead and then have a constant conflict with the core Go community?

edit: this is directly relevant to the thread, which is about Python formatting and Google promoting its weird style




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: