Hacker News new | past | comments | ask | show | jobs | submit | cztomsik's comments login

+1, no idea.

But maybe to add a little bit of context, "damage tracking" means for example, that if there is any ongoing animation (like a spinner), then only a small part of the screen will be re-rendered (with proper vertex-time scissors, so only relevant pixels will be computed). I am not sure if it makes sense in the context of a terminal emulator, but it's certainly a big issue for any non-toy GUI toolkits.

GPUs are incredibly fast parallel computers, so you will likely not observe any perf difference (unless you need order-dependent transparency which you don't), but it might improve your battery life significantly.


No, damage tracking is important as it about reporting that you only updated that spinner, which means that your display server also knows to only redraw and repaint that area, and in turn that your GPU knows during scanout that only that area changed.

Without, even if you only redrew your spinner, your display server ends up having to redraw what might be a full screen 4k window, as well as every intersecting window above and below until an opaque surfaces are hit to stop blending.


Well it sounds like ghostty is like all the other major GPU terminal emulators (unless you know of a counterexample) and does full redraw, though it appears to have some optimizations about how often that occurs.

The power issue might be true in some cases, but even as foot’s own benchmarks against alacritty demonstrate, it’s hyperbolic to say it “kills performance”.


We do a full redraw but do damage tracking ("dirty tracking" for us) on the cell state so we only rebuild the GPU state that changed. The CPU time to rebuild a frame is way more expensive than the mostly non-existent GPU time to render a frame since our render pipeline is so cheap.

As I said in another thread, this ain't a AAA game. Shading a text grid is basically free on the GPU lol.


It's actually not at all free, even if you're barely using the shading capacity. The issue is that you keep the power-hungry shader units on, whereas when truly idle their power is cut. Battery life is all about letting hardware turn off entirely.

Also, if you do damage tracking, make sure to report it to the display server so they can avoid doing more expensive work blending several sources together, and in case of certain GPUs and certain scanout modes, also more efficient scanout. Depending on your choice of APIs, that would be something like eglSwapBuffersWithDamage, wl_surface_damage_buffer, and so forth.


> The issue is that you keep the power-hungry shader units on, whereas when truly idle their power is cut.

Even in the most perverse scenario of a single cell update the load for a terminal is still bursty enough that it's not like the GPU doesn't enter some power saving states. Running intel_gpu_top in kytty with a 100ms update is at least suggestive, it never drops below 90% RC6 (even at 50ms, which is a completely uselessly fast update rate we're still in the high 80s). If you're updating faster than 100ms legitimately, it's probably video or animation that is updating a large percentage of the display area anyway. The overall time my terminal is doing some animation while on battery is low enough that in practice it just doesn't matter.

https://en.wikipedia.org/wiki/Amdahl%27s_law

The problem you're up against is that, maybe if this was optimized most people would get 2 or 3 (or even 10) more minutes on a 12 hour battery life or something. No one really cares. Maybe they should, but they don't. And there's plenty of other suck in their power budget.

You make it sound like a binary power saving scenario, but it tends to be more nuanced in practice.

Most people run their terminal opaque, most display systems optimize this common case.

> Also, if you do damage tracking, make sure to report it to the display server

I'm not unsympathetic to your point of view. But I am skeptical that the power savings for most use cases ends up being much of a big deal in practice for most people (even accepting there may be some annoying edge cases that annoy some). I am interested in this topic, but I am still awaiting an example of a GPU accelerated terminal emulator that works this way to even make a real world comparison.


It is very nuanced, but it's important to realize how small the power budget is and how just tens of milliwatt here and there make a huge difference.

To get over 12 hours of battery life out of a 60 Wh battery - which isn't impressive nowadays with laptops rocking 20+ hours - you need to stay below 5 watts of battery draw on average, and considering that the machine will likely do some actual computation occasionally, you'll need to idle closer to 2-3 watts at the battery including the monitor and any conversion losses.

The really big gains in battery life are from cutting tens to hundreds of mW off things at the bottom by keeping hardware off and using fixed function hardware like e.g. avoiding rendering and doing direct scanout and using partial panel self refresh. Execution units do not turn on and off instantly so pinging them even briefly is bad, and the entire system needs to be aligned with the goal of only using them when necessary for them to stay off.

Efforts like libliftoff to do efficient plane offload to avoid render steps in the display server can save in the area of half a watt or more, but it's not a whole lot of help if applications don't do their part.

Bigger GPUs than your iGPU (or even just later iGPUs) will also likely see even bigger impacts, as their bigger shader units are likely much hungrier.

(As an aside, I am not a fan of kitty - they have really weird frame management and terrible recommendations on their wiki. Foot, alacritty or if ghostty turns out good, maybe even that would be better suggestions. Note that comparing to foot can give a wrong image, as CPU-based rendering pushes work to the display server and gives the illusion of being faster and more efficient than it really is.)


Well I would be very interested in some of this, but it all seems theoretical and mythical. Seriously, what terminal is giving 20% better battery life (or whatever number that people will notice) than kitty.

How can I observe any of these claims in practice? You’ve put some down some bold claims about how things should be done but no way to verify or validate them at all. Put up with some real power benchmarks or this is just crack pot.

> To get over 12 hours of battery life out of a 60 Wh battery - which isn't impressive nowadays with laptops rocking 20+ hours

I used 12 hours to be nice. The sell of getting another 10 minutes or so out of 20 hours is even more stark.

The cases where you push a line and scroll, you're repainting most of it anyway. The cases where you're not are end up being infrequent enough that optimizing them in the ways suggested makes an unnoticeable impact. Build it and they will come maybe?

> Bigger GPUs than your iGPU (or even just later iGPUs) will also likely see even bigger impacts.

In most cases people can get by with an iGPU for a battery laptop cases. If you're in a must pull down more graphical power case, you're often plugged in and few care about 10s of milliwatts then.

> (As an aside, I am not a fan of kitty - they have really weird frame management and terrible recommendations on their wiki. Foot, alacritty or if ghostty turns out good, maybe even that would be better suggestions. Note that comparing to foot can give a wrong image, as CPU-based rendering pushes work to the display server and gives the illusion of being faster and more efficient than it really is.)

Once again, what is the exemplar of an efficient terminal then. We've already established ghostty doesn't operate the way you think it should so how can it turn out good?


Perhaps that came across wrong, no shade intended, what was meant was that ghostty seems to be like all the other mature GPU based emulators. Which means there's no damage reporting to a display server or anything like that. I don't think it's quite the deal breaker the GGP implies.


That's what I meant to say. There are 2 parts, and both need to work correctly, otherwise you're wasting power.


esbuild is a bit unfair example because Go is really great fit for this kind of job, it does a lot of I/O, it allocates a lot, and then it just dies. evanw even tried to use Rust at first but it was slower than Go version.

IMHO, if JS had struct types, the difference could be much smaller. Dead proposal here: https://github.com/rbuckton/proposal-struct


IIRC there's `zig reduce` builtin in Zig :) https://github.com/ziglang/zig/blob/master/lib/compiler/redu...


You pay for a car. This is free and OSS.


Are you sure? Maybe I only borrow the car.

Edit: I apply the same rule to FOSS, of course. If it can't idle or if it freezes up or crashes sporadically or whatever I remove it. I'm old, I don't have the patience for stuff that breaks without good reason.


explicit about branching and allocations, not so for types. we've recently got .decl() syntax, which is even more implicit than .{}



I think around the BLOOM models (2022) it was found out that if you train english-only, the model performs worse than if you have even little mixture of other languages.

Also, there were other papers (one epoch is all you need) where it was shown that diverse data is better than multiple epochs, and finally, there was paper (textbooks is all you need) for famous Phi model, with conclusion that high-quality data > lots of data.

This by itself is not a proof for your specific question but you can extrapolate.


Exactly, they should be sued for this. I think the only people who are defending Apple are the ones who did not read the ToS carefully.


If you mean that LM-head is just inverted embedding matrix then this was already done in GPT-2.

Unfortunately, the only thing I found out about this is that bigger models benefit from separate layer. But this was only mentioned somewhere in discord, so no paper to read and my personal hunch is that it should work for bigger models too. After all, GPT-3 was just scaled GPT-2.

From my personal experiments, models learn better if you give them harder task. And tied weights could be one of such things. Multi-token prediction could be another and bitnet could be also considered such... (and dropout too)


can you comment on what those 500 lines actually were? react+redux?


It was all normal JS code to handle a complicated form we have on a customer facing portal page. By moving to htmx I was able to rely on the server side to handle basically everything with only a small addition of code to what it was already doing.


I'm qurious what that code was doing that could be just moved to the backend. Usually forms have some validation for a better experience with the final validation on the server. Did you get rid of that and just show an error if the submission failed?


Near enough yes but it was a nice helpful error message with the form fully populated as they left it - as far as it looks the user it’s client side validation as the service responds with just the correct html for the form and htmx swaps out the chunk of html.

Its like old school for validation back in the CodeIgniter days.


So the 500 lines removed was literally just removing functionality in the first place (you should always validate on both sides anyway).


Isn’t that moving code around, not deleting it?


Most likely that code already existed server-side and was duplicated client-side. It's what usually happens since frontend code can't enforce invariants.


Could you elaborate on what you mean by "frontend code can't enforce invariants" ?


Frontend code can be easily bypassed or changed so you can't trust it to keep the state of your application in good condition. Nothing prevents me from picking the same username as someone else if there is no server-side code stopping it. Nothing prevents me from replying to a deleted comment if I don't check with the server.

That means that any validation that you do client-side you need to repeat server-side. Any business logic that you have client-side you also need to repeat server-side.


Near enough exactly this


can anyone comment on what 500 lines of react+redux actually do ?


Also what line N+1 does, please.


all htmx does is replace divs on the page, so probably just code that did that


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: