This is about pretty printing that can extend all the way to source code formatting, not just simple data structures.
See A Pretty Expressive Printer for the full explanation, but basically: The input to the printer is in a "pretty printing language", a tree of fragments with formatting choices (splitting, indenting, different forms, etc.). The job of the pretty printer is to take a line width, make all these choices somehow, and render the result.
You can have simple (just newline/indent) or complex choices (look at all the ways you can write Haskell), and you can have a simple (local, greedy) or complex (global, optimal) algorithm. Of course the goal is to get close to optimal without taking time exponential in the number of choices. So it ends up looking more like TeX than the simple "oops, line would overflow" logic.
I had been thinking in terms of source code printing.
I don't understand the need for global analysis: you're not rewriting the code, not restructuring it - you're just choosing where to put the whitespace. So why would the formatting of one statement impact the formatting of the next? why would the formatting of one function impact the next? etc.
That's what I mean: why does the formatting require global knowledge?
So why would the formatting of one statement impact the formatting of the next?
It doesn't. But it impacts the formatting of the sub-statements nested inside it and of the parent statement it's nested in.
Top-level statements can be large and complex (entire impl block in Rust, entire class definition in Python, etc) — there's not much difference between "local to an entire Java class definition" and "global to a Java source file".
And even if we can handwave that away by saying "it's a block, the body is a sequence of indented statements", we still hit the hard edge cases: a really complex condition of an if statement, a really complex expression on the right hand side of an assignment, a really long method call chain (some of the calls have complex expressions as arguments, maybe even code blocks / lambdas), a really complex where declarations in Rust… deciding how to break this kind of statement/expression into multiple lines is a lot of choices to make. A "global" 10kloc Java source consisting of boilerplate and trivial statements will be much simpler to format than a "local" alignment of a 1 kilobyte IOCCC one-liner or a sufficiently advanced Perl expression.
No knowledge of the next statement is needed, but one statement needs to be laid out as a whole, because choices of whitespace within one statement do actually impact other parts of that statement.
With sufficiently large, multi-line statements, this becomes sufficiently "global" to be a problem. Especially in Rust, where you can e.g. nest a loop inside a lambda argument to a function call, inside a macro invocation. In fact, even a loops themselves are really one big statement (you can write a loop on one line, and that definitely impacts its body!) but that case you can probably decide relatively easily.
Yeah, for statement-based languages like Rust or Python, I think auto-formatting should be a local algorithm -- each statement can be considered more or less separately
Line wrapping (aka pretty printing here) is a sub-problem of auto-formatting
For functional languages and JSON-like data structures, you just have a big expression, so that's more of a global problem
(And note some auto-formatters don't perform line wrapping at all, e.g. go fmt as far as I remember)
olliej | 14 hours ago
I’m somewhat curious what globally optimal pretty printing is? Pretty printing is to me (after basic structural elements) a reasonably local affair.
wrs | 13 hours ago
This is about pretty printing that can extend all the way to source code formatting, not just simple data structures.
See A Pretty Expressive Printer for the full explanation, but basically: The input to the printer is in a "pretty printing language", a tree of fragments with formatting choices (splitting, indenting, different forms, etc.). The job of the pretty printer is to take a line width, make all these choices somehow, and render the result.
You can have simple (just newline/indent) or complex choices (look at all the ways you can write Haskell), and you can have a simple (local, greedy) or complex (global, optimal) algorithm. Of course the goal is to get close to optimal without taking time exponential in the number of choices. So it ends up looking more like TeX than the simple "oops, line would overflow" logic.
olliej | 10 hours ago
I had been thinking in terms of source code printing.
I don't understand the need for global analysis: you're not rewriting the code, not restructuring it - you're just choosing where to put the whitespace. So why would the formatting of one statement impact the formatting of the next? why would the formatting of one function impact the next? etc.
That's what I mean: why does the formatting require global knowledge?
Psentee | 5 hours ago
It doesn't. But it impacts the formatting of the sub-statements nested inside it and of the parent statement it's nested in.
Top-level statements can be large and complex (entire
implblock in Rust, entire class definition in Python, etc) — there's not much difference between "local to an entire Java class definition" and "global to a Java source file".And even if we can handwave that away by saying "it's a block, the body is a sequence of indented statements", we still hit the hard edge cases: a really complex condition of an
ifstatement, a really complex expression on the right hand side of an assignment, a really long method call chain (some of the calls have complex expressions as arguments, maybe even code blocks / lambdas), a really complexwheredeclarations in Rust… deciding how to break this kind of statement/expression into multiple lines is a lot of choices to make. A "global" 10kloc Java source consisting of boilerplate and trivial statements will be much simpler to format than a "local" alignment of a 1 kilobyte IOCCC one-liner or a sufficiently advanced Perl expression.tomsmeding | 10 hours ago
No knowledge of the next statement is needed, but one statement needs to be laid out as a whole, because choices of whitespace within one statement do actually impact other parts of that statement.
With sufficiently large, multi-line statements, this becomes sufficiently "global" to be a problem. Especially in Rust, where you can e.g. nest a loop inside a lambda argument to a function call, inside a macro invocation. In fact, even a loops themselves are really one big statement (you can write a loop on one line, and that definitely impacts its body!) but that case you can probably decide relatively easily.
andyc | 12 hours ago
Yeah, for statement-based languages like Rust or Python, I think auto-formatting should be a local algorithm -- each statement can be considered more or less separately
Line wrapping (aka pretty printing here) is a sub-problem of auto-formatting
For functional languages and JSON-like data structures, you just have a big expression, so that's more of a global problem
(And note some auto-formatters don't perform line wrapping at all, e.g.
go fmtas far as I remember)