Hacker News new | past | comments | ask | show | jobs | submit login
Apache Software Foundation joins GitHub open source community (github.blog)
476 points by moritzplassnig on April 29, 2019 | hide | past | favorite | 244 comments



Apparently, one of the big motivating reasons for this was "cost".

> The foundation’s 2018 five-year strategic plan noted that infrastructure services account for more than 80 percent of the total ASF expense budget, adding: “Increasingly, project communities have infrastructure requirements that strain the capabilities of the ASF.”

> The report noted that, given burgeoning costs, encouraging the use of more externally provided services was its best option. (“Using a simple growth forecast to project expenses and effective governance and mentoring to ensure that using externally provided services does not in any way present barriers to entry to projects or reduce transparency, inclusiveness and diversity.”)

https://www.cbronline.com/news/apache-software-foundation-gi...

Given that I find ASF has always been very moral with their fundraising and careful with their money (unlike a couple other major non-profits I happily would name in any other context), as disappointed as I am with this decision, it is difficult for me to blame them for making it: git is extremely difficult to scale correctly (due to its reliance on interactive protocols), which led Google Code and then GitHub to rewrite large portions of it (of course, as closed source internal-only this-is-our-competitive-advantage projects); when you are a small non-profit, knowing that you would have 5x the resources for staffing if you just swapped out some mere tooling has got to be a really really tough choice for something that isn't quite your core moral (as it would be with say, the FSF).


Holy shit, they're spending $800k a year on infrastructure! Honestly, it's difficult to understand why they haven't sooner moved to GitHub, or even GitLab or the like - it feels reckless. That money could be put to far greater use - as an Apache supporter who hasn't ever felt the need to look at their costs, I have to say that I'm very disappointed.


I encourage you (and everyone else) to look more closely at the ASF budget. As noted above, it's 80% for infrastructure, and that is supporting hundreds of projects. Some of those are simple, some are not at all (Hadoop, Cordova, Spark, OpenOffice, Mesos, CloudStack, etc.).

In fact, many of those projects rely on other resources not included in that 800k: How much does it cost to run a testing farm for all the devices Cordova has to run on? Do you really think 800k is such a crazy number?

The ASF runs lean. It has to, because it is a 501(c)(3) charity and cannot be as responsive to large donors as some might prefer (i.e. giving them more direct control over the foundation in return for more funding.)


Note that the asf infra is - at best - insufficient. Cassandra relied heavily on donated Jenkins worker VMs for all tests because the asf donated VMs couldn’t run the test suites. Recently we started moving tests to circleci... because it was better than waiting 24 hours for tests to run in Jenkins.


While Infra's suite of services is perfectly adequate for many projects, it's not uncommon for resource-intensive projects which strain Infra in one way or another to have to improvise. That's what I was alluding to with the build farm for Cordova -- don't quote me on this, but I believe it's provided by Adobe (and may even run on PhoneGap).

Things would probably be different if the ASF was a pay-to-play 501(c)(6) "business league" rather than a 501(c)(3) charity. Then the Foundation would have more funding at the price of having to answer to corporate masters.

But since ASF Infra must run lean, it is forever trying to find ways to do more with less -- and Github integration is an instance of that.


Maybe I hadn't given it too much thought before commenting. Do you know where I can find more details of what the 800k goes on?


Besides the five-year plan, the most easily accessible public records would be the Annual Report and the monthly reports from the Infrastructure V.P. and the Treasurer.

https://www.apache.org/board/plan.html

https://s.apache.org/FY2018AnnualReport

https://whimsy.apache.org/board/minutes/Treasurer.html

https://whimsy.apache.org/board/minutes/Infrastructure.html

I don't know that there's a more granular breakdown of Infra's budget available online. For what it's worth, the biggest single expense, for many years, has been sysadmin labor. Bandwidth also looms large, though last I knew much of what the ASF used was provided by OSU OSL and SourceForge.


Also, those costs cover much, much, more than git.


> unlike a couple other major non-profits I happily would name in any other context

Can this reply be the context? Which ones?


My first thought is Wikipedia, but I'm sure there is a large amount of other examples.


Sounds like a stupid question, but I am seriously asking it.

What was the infrastructure of ASF? Own servers? Wouldn't moving to a cloud provide helped reduce cost?

Ethically speaking, isn't it a conflict of interest that an idol of the open-source world is pairing up with Microsoft. Now that Microsoft has lost the former glory, they are desperately trying present a clean image, but all their deeds of the past are still fresh. I am waiting for the MSJVM 2.0 with github & atom & VSCode.

How is the Linux or FSF or Mozilla repo hosted? Scaling issues with GIT cannot be only tackled by Microsoft & Google.


>as disappointed as I am with this decision

May I ask why? Them moving doesn't look like an issue to me at least.


Personally, my only gripe (even though it's not a huge deal) is I'd have preferred it if they used hosting that actaully was open source (like gitlab)


That would decrease costs, but would it meaningfully increase community participation?


I believe the ASF blog[1] to have a much better title: "The Apache® Software Foundation Expands Infrastructure with GitHub Integration"

"joins" sounds very wrong to me. Apache has had a Github account for years and the mirrors have existed for years as well.

The ASF still hosts their own Git repositories at https://gitbox.apache.org/ In fact all ASF projects needed to migrate from the old git-wip to gitbox just recently (December 2018) [2] and in that announcement they said: "When your project has moved, you are free to use either the ASF repository system (gitbox.apache.org) OR GitHub for your development and code pushes"

So I believe the ASF still hosts a fully up-to-date git repository for each of their projects that use git. It's just that the integration has gotten much better between ASF infrastructure (e.g. Jira) and Github.

[1] <https://blogs.apache.org/foundation/entry/the-apache-softwar... [2] <https://blogs.apache.org/infra/entry/relocation-of-apache-gi...


According to the announcement:

>In February 2019, the migration to GitHub was complete, and the ASF's own git service was decommissioned.

P.S - the links you provided are not formatted correctly, and leads to 404 pages due to trailing '>'


Sorry for the broken links.

Yes, you are correct, that's what it says in the announcement but that is not correct. The "old" ASF git stuff was decommissioned, yes. It's a bit misleading.

A more detailed explanation has been put up here: https://blogs.apache.org/infra/entry/apache-and-github-a-fri...


Important clarifications: - The announcement is only about technology, there's no "partnership" between the ASF and GitHub. The ASF is vendor-neutral about all of it's operations. In particular, there is an expectation that Apache project communities continue to do much of their community and release management on ASF servers, not solely on GitHub. - Many Apache projects asked to use GitHub. It took a while, but Apache infra now allows that, as long as the repos are in our organization. Many projects still use our Subversion repo(s) too. - The ASF hosts it's own Git repos with all auditable history. So GitHub is merely one way that Apache projects can choose to allow users to contribute. If GitHub went away overnight, the ASF would still have all our own code and could keep working with our own build tools and plain old `git`. The ASF didn't decommission it's own git repos, just some of the tooling we used to mirror between our repos and GitHub.

Lars elsethread brought up a useful ASF blog post: https://blogs.apache.org/foundation/entry/the-apache-softwar...

And yes, https://github.com/apache is the ASF.


Is the tooling decommissioned because you’re using alternatives or because you’re no longer mirroring repos?


It confuses me why so many traditionally pro-FOSS projects move to a not-free-nor-open tool like GitHub. Do they think that they’ll get enough new contributors this way to offset the (more than slight) irony?


The number of hard line FOSS folks is really quite small.

Personally I would be hard pressed to bother contributing to a project not on GitHub at this point. There is a certain workflow and interaction model that GitHub projects use that non-GitHub ones do not and it is simply not worth the time investment to learn those other projects. Not only that but it allows me to easily point at my work and go "I did that" when talking with folks.


> Personally I would be hard pressed to bother contributing to a project not on GitHub at this point.

Though I wonder if "homogenous ecosystem" effects/issues could rear their ugly head...just a random thought.


What about a gitlab based project?


I guess gitlab is the same. The important flow is that there's a very easy way to fork and send a pull/merge request.


"Easy" like:

    git diff master..bugfix > bugfix.patch # or `format-patch`
    # now attach/upload bugfix.patch
Instead of:

    # make sure you click around github.com to create third fork
    git remote add unnecessary-third-fork $THIRDFORK
    git push unnecessary-third-fork bugfix
    firefox $THIRDFORK # now click around to file a PR
    # now wait for your PR to be merged
    # now click around on github.com to delete $THIRDFORK
    # ... unless you just leave things laying around
What makes the second sequence easier than the first?


In the "easy" path, you're neglecting the lack of standardization in the "attach/upload" step. For some projects it might be as simple as "create an account on their bug tracker and open a new bug and attach the patch", but for others it might be "dig through their website to find a mailing list, dig more to figure out how to subscribe to that mailing list, send email". and then you might: "get a bounce because the patch is too large", or have to "repost because your mailer posted the patch in-line instead of an attachment, which corrupted it", or...

As much as I hate centralization, especially when the central entity is a for-profit corporation running closed software, often that ends up giving you a standardized experience that makes things easier. "Easier" doesn't have to mean fewer steps; I agree that the GitHub workflow you describe isn't simpler, but if you've done it a few times, it's mechanical and you don't need to think about it. GH even provides a command-line tool[0] that lets you avoid most of the click-around-on-website steps.

[0] https://github.com/github/hub


I agree with some of what you say—one caveat is that you're still on the hook, for example, for finding out which GitHub URL/repo maps to the project you want to contribute to. In practice, this is roughly on par with the difficulty of finding the link to the self-hosted Bugzilla instance. It's a shame that decentralized single sign-on is still such a disaster, since that's essentially the one thing that GitHub has as a leg up over other options—assuming you've contributed to some other GitHub project before.

To stray outside the lines with some meta-commentary: it's nice to get a well thought out response instead of the sort of kneejerk rooting-for-my-home team that's on display in the wasteland of intellectual dishonesty in the comments below.


I'm not sure it's the same... when I search for issues and contributions... I almost always google for github projectname etc. When it's in GH, usually it's easy enough (unless issues are closed because they're managed with a different repo).

For the most part it's the same workflow. Also, if it's a trivial change (like fixing/appending something in documentation) you don't even need to leave the browser.

Discovery is another issue... it's far more easy to use Github semi-socially than most other platforms. Something I both love and hate is that GH doesn't have a direct message functionality. On the one hand, I wouldn't want to be bothered with a ton of end user emails for the same issues over and over... on another, after you've waited a week for a bug fixing PR, it's not fun either.


The majority of git users I know don't know how to create or apply a patch. The few that do only do so in the odd FOSS project that requires it.

Whereas when using Github you click a button, get your own copy, do whatever you want to it, and then click a button to open a pull request. You _tried_ to make it look like using GitHub.com is somehow... complicated. But it's dead simple and you even added steps, like "waiting for PR to merge" etc that is the same with a mailing list anyway.

I get it. You might like the mailing list better to avoid a single company handling all of the OSS contributions. But let's not ignore the actual good aspects of github by making up stuff. If you want to convince people to _not_ use github it's going to take more than this.


[flagged]


Can you clarify the flow that doesn't involve github and also doesn't involve mailing lists?


You could upload the .patch to something other than GitHub.


Can't reply to the sibling, so replying here. Mozilla used to work that way (upload patch to Bugzilla), but it was so cumbersome they switched to Phabricator.


Yes, exactly. Literally anything that has an upload button in the bugtracker for you attach your patch that fixes the bug.


> Please actually point out how the GitHub workflow can be even more simplified than what I outlined above

Adding a remote is generally a one-time cost and is unneeded for every PR, so adding that command (along with all the associated comments) makes it appear more complicated. The reality for most GitHub users is that they simply have to do:

`git push origin <branch name>`


You can't push to origin unless it's your own project or your team's. We're talking about PR-based workflows.

> Adding a remote is generally a one-time cost

It's not a constant cost, unless you're saying you only ever intend to contribute to one project ever. It's a fixed cost that you will pay N times, where N is the number of projects you contribute to.


> You can't push to origin unless it's your own project or your team's. We're talking about PR-based workflows.

Having to create a fork per PR is a rather antiquated way of doing it. In my experience, you can almost always push to origin and create a new PR from the branch, but maybe I've just been lucky with the projects I contribute to.

> It's not a constant cost, unless you're saying you only ever intend to contribute to one project ever. It's a fixed cost that you will pay N times, where N is the number of projects you contribute to.

It's a constant cost in the same way that looking up where to submit your patch to is a constant cost. You will pay both N times, where N is the number of projects you contribute to.


> you can almost always push to origin

Why am I having to repeat myself here? You can never push to origin unless it's your own project or your team's project.

> It's a constant cost in the same way that looking up where to submit your patch to is a constant cost. You will pay [...] N times, where N is the number of projects you contribute to.

In other words, it's not a constant cost.


>why am I having to repeat myself here

Because you are incorrect and not reading the responses.

>you set origin to the branch you own...


[flagged]


Buddy, the entire premise here is user "Monotonic" telling me that configuring remotes is unnecessary and that that in fact he or she just pushes to origin.

They are referring to origin as the forked repository. E.g. if I contribute to nixpkgs (the NixOS package repository), I only have to fork it once, use that as my origin, and can create branches and submit PRs.

So, you are both right. If you contribute many times to the same repo, you only have to fork once. If you do a lot of drive-by contributions, you'll end up forking a lot of repositories.

(I fully agree that GitHub has a lot of overhead compared to git format-patch/diff. GitHub et al. also have some benefits in terms of communication. At any rate, diff/format-patch are not that hard, so I think any git user should learn it.)


On the drive-bys... create a YOURNAME-contrib organization, that you can fork to. I've started down this path which will help more as I get used to it.


> If you contribute many times to the same repo, you only have to fork once. If you do a lot of drive-by contributions, you'll end up forking a lot of repositories.

That doesn't contradict anything I've written here, or anything I've written in years past on exactly this topic. But this _entire_ branch of conversation started with someone quibbling that I didn't rank configuration of remotes as a zero-cost operation. So, no, we're not both right.


Right, you set origin to the branch you own, and upstream to the original project. Then `git push origin foo` works, and you can get a URL printed directly on commandline to start the PR flow.

I agree it's a cost per repo you contribute to. However, you can also do it reasonably cheaply with scripts. I recall you have to use hub in addition to git commandline, but once you get it set up then it's basically zero extra commands if you bake it into a clone. Run a script that does the fork to your github username and clone to your local box, do your normal modifications and commits, then git push origin and click on the URL to get dropped into the upstream PR workflow.

The fork bit only needs to be done once.


> Right, you set origin to the branch you own

That's surely equivalently complicated as adding a second remote.

Almost every pull request I've made to a project on GitHub has started with a "git clone http://github.com/example/example.git", since they start with bringing down the source code and finding the bug in the project. Sometimes it's something I can fix, so I then need to fork the project on GitHub, add a remote (or replace origin with my fork's location), and make the commits.

That's not too difficult, but it's not easier than sending a diff to a mailing list. If any discussion is necessary, it's easier to keep track of that on GitHub. It's also much easier to see the patch 3 years later, if the maintainer wasn't interested — that's the big feature which makes GitHub (or its competitors) worthwhile to me.

(A long time ago I sent a patch to Git itself to the Git mailing list, and it was about 6 months before it was applied. However, it was applied, so they must have had some way of keeping track.)


You do push to origin in a PR-based workflow.

The origin you push to is your fork of the project.

Fork, push, pull request.


I'm going to try to make this as straightforward as possible: Where in this[1] comment does an error lie? Please make a copy and edit it to reflect what you're saying here and then show me the result.

1. https://news.ycombinator.com/item?id=19779664

I'm going to skip ahead here. You're going to replace the `add unnecessary-third-fork` command with `set-url origin $THIRDFORK`. Either that, or you swap for a `git clone $THIRDFORK` so "origin" is set as a result of the clone.

How many steps do you need to eliminate before you can match the cost first sequence (2 steps)? How many steps does your advice eliminate? What are the total number of steps involved in the GitHub approach? I'll wait for your answer this time.


The attach and upload part, and figuring out how the discussion went on the mailing list if changes are requested.


What makes either of those things harder than what you have to do to use github.com?


If you have never contributed to a project before, the "attach patch and upload/send mail" part is a significant hurdle. You have to figure out which mailing list to use, which is not always easy to find, then find out the person/group to address it to. That alone already discouraged many people from even bother.


It's hard when you have lots of folks jumping into a code review, and you don't want to duplicate effort or you want to rapidly iterate on your code.

You know those email chains that just continue to fork? Where people reply to the original (or the first couple replies) with their wall of text after a couple other replies have already trickled through?

That's what a collaborative, stateful PR solves.


Everyone is trying to get me to defend mailing lists. I have no idea why. I haven't said anything about them. In fact, I hate mailing lists.


> I haven't said anything about them.

> # now attach/upload bugfix.patch

Where do you "attach" your patch file?

Also, if it's not GitHub/GitLab/gerrit/reviewboard/etc. and not mailing list, what other workflow for code contribution are you talking about then?


I used patches and lists before, a lot. It was either hard or impossible to track more than one at a time, in context and with lots of conversations around specific lines of code. It was also a nightmare for maintainers to constantly want to tell people to rebase their patches based on other people's patches. To link patches to other patches to issues, etcetera.

The Github website and its fork/pull request flow has increased my productivity and the amount of things I contribute to or can maintain with some level of sanity 100x easily.


Patches sucked, so Torvalds switched to BitKeeper and then wrote Git.

If my project uses Git, I can easily accept a patch. If someone happens to give me a patch against some old version that doesn't apply to HEAD, I can just "git reset --hard" the HEAD that version, apply the patch, and then rebase with "git rebase".

I would expect most people to be making patches out of their own git repo (using "git format-patch") anyway; they should be able to rebase first.


Gerrit would be simpler. You would just do `git push origin HEAD:refs/for/<branch>`. And I would argue the "now attach/upload bugfix.patch" is as unnecessary and time consuming as the remote/fork branch mechanics that the PR workflow requires (not sure why you didn't include the git send-email/specify email address part.) Gerrit review flow is much simpler than both.


I agree that the initial review creation process is a bit complex with the GitHub flow. But most of the time, you end up needing to make changes to your diff based on review comments, etc.

Then there is hooking up PRs to your automation setup.

This is where the GitHub approach shines imo.


GitHub could support this workflow - they would need to find a way to integrate patches into GitHub Pull Requests.

One way to do it would be an "upload patch" option on others' repositories, where GitHub forks the repository for you under the hood, possibly creates a branch for you, and applies your patches linearly to that branch. It opens a pull request to the targeted branch of the upstream repository from your branch. Then when the pull request is closed, it cleans things up for you (temporary branch, fork) under the hood, if desired.


github/gitlab is a good place to browse the code. What better place to have a single click to create the fork you need?

And you do need that fork. If you want any kind of CI/CD stuff on the repo you sort of need to pull the changes in from a third party source to make sure nothing bad will happen.

now try to keep your patches up to date with constant rebases and comments from reviewers. Maybe some parts are ok, but some are not so you go back and forth for a couple of weeks. Fewer people will want to go through this extra hassle that they don't even get paid for.


FWIW there is a CLI tool for interacting with GitHub.

https://github.com/github/hub

And I do just leave forks and branches laying around :/


You ignore the most important part of the workflow, the patch review and the iterations that you'll probably have to do on it. If you just send your patches to random mailing lists and never bother to follow up on them then yeah, that's pretty easy.

I like "old tech". I use emacs, my mailer is mutt, I don't like HTML email, I like IRC, I like using a terminal, I don't like how the web is eating everything.

Still, my experience using mailing lists is just garbage.

Random anecdotes from working with project using mailing lists for patch reviews:

- I find a patchset that seems interesting, but I wasn't subscribed to the ML back when it was posted. Now I need to dig up the mails on some archive out there. I want to see if there were important comments/revisions on these patches? Well here goes 30 minutes of clicking on "next by thread" to sift through the entire discussion, hoping not to miss anything.

- Every project has slightly different guidelines for contributing. Should I put somebody in copy? Run some script on the patch beforehand? Is there a special procedure for contributing patches? Here comes 15 minutes sifting through the "contributing" doc to figure out the modus operandi. I still get it wrong from time to time on projects I don't frequently contribute to (mainly because I get confused between different projects and forget the idiosyncrasies). And of course you need to figure out the exact mailing list to use, whether you need to subscribe to post on it etc...

- You get some feedback on your patch and need to create a new revision? Oh boy, that's where the fun begins. Don't forget to set the "--in-reply-to=" to your git command line if you want your patches to thread correctly! Also some projects prefer that you add a revision number to your patch set, but I actually forgot how to do that and a quick browse through git format-patch's man page didn't help me. Boy, this sure is easy and straightforward! To think of those losers on GH who just have to push their updated commits on their branches and the PR is automatically updated.

- Okay now you've amended your patchset and integrated your modifications. But the patchset is large and the modifications are mainly small coding style issues. Do I send the patch right now, at the risk of getting comments on two separate threads, one outdated, and also risking spamming the mailing list if I do other modifications in a row? Wouldn't it be better buffering the changes and pushing a big new patchset later once I get more feedback? But then the other reviewers will work with outdated code... Wow, it sure does feel like the proper tool to work with! I'm so glad I'm not using github's PR system right now.

I can't justify this. I'll defend IRC over Slack/Discord to the bitter end, mutt over gmail, Emacs over VS Code but I just can't comprehend how mailing lists are still a thing, much less patch mailing lists. I actually have some modifications to software that I use that I didn't bother upstreaming because I can't be arsed to figure out where those patches need to be sent and how. On the other hand I have already submitted PRs for small, non-important one line changes on github because it's so simple and frictionless.

I want mailing lists to die. I want patch mailing lists to die a painful death.


Now how does CI work with that first workflow?


in case of the linux kernel, there are bots that lurk on lists and then sometimes they send you a mail if they found a bug running whatever arcane test suite on your code.

so, in general, it works like shit, but some people are not willing to accept this, because "it works for me".


You can create a pr with the open source "hub" tool with a single command.


I only counted opening a pull request as a single operation to begin with.


Are you honestly taking the position that graphical user interfaces are harder to use than CLI?


Let me ask, do you think that the content of my message reduces to "graphical user interfaces are harder to use than CLI"?


Yes, I do!


Please explain how you got there, because I'm at a loss. Both examples involve using the CLI. The GitHub version I ran through actually includes more CLI steps. How could that possibly have been my intention?


Your references to clicking repeatedly, and the general fact that github.com's innovation is the GUI it puts on top of a git workflow.


I'm mystified about what's going on right now.

Both versions involve clicking.

Both versions involve command-line steps.

The difference is that the GitHub version requires more of both, needlessly. That's the point of what I wrote. That's the only point.

I don't understand this context where I'm being forced to defend an argument that's been foisted upon me and that I never made and never even thought of trying to make.


I'm sorry you don't understand it, but you don't get to ignore the implications that spawn off of your argument just because they're inconvenient.


What? You get to imagine some bad argument that would make it most convenient for you, and then demand that I defend that argument, as if it were one that I wanted to make? This is what I'm responsible for? Dealing with someone who resorts to strawmanning the person/idea at the other end when challenged?


I'm not imagining it, and that's the problem you're running into. It's a consequence of what you said.


It's not a direct consequence of anything that was said. It's a possible implication - in the conversational sense of implication, not the logical sense - but those get to be disclaimed.


His stance is literally that using a GUI is harder. That isn't a consequence, it's his core position.


FWIW, I certainly didn't read his comment that way, and in light of his explicit denial that that's what he intended maybe you should reconsider whether there are alternate interpretations available.


Are you reading some of the other comment threads he's participating in? It seems like he's taking this whole thing very personally, and it's hard to pin down exactly what he's trying to say when he won't calm down.

He wants me to call him, as if escalating this will be worth it...

He didn't even answer your very reasonable question above. Don't you wonder why that is?


> Are you reading some of the other comment threads he's participating in? It seems like he's taking this whole thing very personally, and it's hard to pin down exactly what he's trying to say when he won't calm down.

Feeling misinterpreted can be very personal. That said, in both my initial reading and my re-reading you come off worse than he does.

> He wants me to call him, as if escalating this will be worth it...

Sometimes text-conversation-on-the-internet escalates when neither party intends it, and switching to other modalities can be valuable. That said, it's not at all clear to me that it's a good idea here compared to dropping it.

> He didn't even answer your very reasonable question above. Don't you wonder why that is?

I presume it's because it got lost in the noise.


Heh, you presume I care how I come off in your interpretation...


I presumed you'd value a third-party perspective, particularly as you seemed to be appealing to it in the previous comment. I don't think you care much about me in particular beyond that. ¯\_(ツ)_/¯


It's literally not my stance. And I've literally said that it's not my stance—in a comment (that should have never been necessary to begin with) meant to remove all doubt. And you know this.

To continue saying otherwise (explicitly, even) is a case of outright intellectual dishonesty.


It's you backing off of a claim you clearly now know is nonsense once someone challenged you on it.


My phone number is listed on my home page. Please give me a call.


Why on God's green earth would I ever give you a call???


The patch version only involves clicking when sending an email with an attachment involves clicking, right?


The communities asked for GitHub integration.


> Personally I would be hard pressed to bother contributing to a project not on GitHub at this point.

Personally I find it a pain to deal with github-only projects. Why should I have to sign up to a social network for coders when I could just send a patch to the development mailing list? It's more depressing than surprising that Microsoft paid $7.5B for that.


Why should I have to sign up to potentially dozens of mailing lists to track the projects I’m interested in when I can track them all in ine place with that ne login?


That's a good question. Part of the answer is that a properly run mailing list doesn't require subscription in order for you to post a question or a patch, and also keeps you in the loop (i.e. lets you receive the replies to your posting) by avoiding "reply-to header munging". The barrier of having to subscribe to lists is just a quick and lazy anti-spam measure that decreases usability for everyone.

So that is to say, mailing lists in their original inception didn't require "tracking" as a prerequisite for engagement. They were really just re-mailing robots: you write a message to a robot, and it sends it to others. Those others do "reply all", so that you receive the reply even though the robot doesn't have you in the list. The robot stays in the loop because it is CC'd, and so the list subscribers can track the discussion.

When I have some question, or want to report a bug, I don't actually want to track all of the activities in that project's mailing list. It is rude to expect that of me. And anyway, there are web archives of mailing lists!

All the mailing lists that I operate are in this classic open manner.


You're tracking dozens of projects? Following every twist and turn of dozens of projects sounds like a huge time-suck, but maybe we just work differently. I follow a couple that interest me (via SMTP), check in on some others from time to time (via NNTP), and occasionally mail a patch when I fix a bug in something I would rather not follow. It's not that hard.


So. A couple that interest you. Plus an irregular check on some others. And occasionally a patch.

So, three-five mailing lists. Vs. the convenience of a single login and actual conversations you can follow in issues/PRs. With a one-click PR request if I want to send a patch (that can be easily discussed, annotated, cross-linked to issues and other PRs).


I forgot to answer this:

> You're tracking dozens of projects? Following every twist and turn of dozens of projects sounds like a huge time-suck, but maybe we just work differently.

The thing is, GitHub provides granular access. You can just star a project, and return to it when you want. You can follow a specific issue. You can follow every twist and turn of a project.

Depending on my current interest or area of research, I can deal with any number of active projects which I may "snooze" when I no longer need them (but will still get notifications on conversations/issues/PRs I'm involved in).

And to do all that I don't need to be a part of multiple mail lists with no control of what gets sent or muted.

I also have direct access to all the PRs and issues where I'm involved without needing to remember which mail list it was on. And without needing to discover those mailing lists in the first place (is it an <x>-users mailing list? an <x>-dev? an <x>-development? an <x>-contrib? an <x>-patch? what are the rules? etc.)

Github is extremely convenient for a very large number of otherwise tedious tasks.


This is simply not true. While I strongly agree that GitHub has the current "best" workflow, other styles do get the job done.

The Git project itself, the Cygwin project and Gawk project all use primarily mailing lists. Again while I prefer the GitHub system all 3 of these projects are responsive to emails and changes are made quickly.


What exactly is not true about my entirely personal statement? I simply find barrier to entry for non-GitHub projects to be too high for me to bother.

Great if you disagree and I'm sure there are some others, but I would be willing to be you're the minority.


So after how many minutes do you give up chasing a bug?

How big of a barrier is writing tests or documentation?

Or setting up the whole environment to actually build something pulled from Github?

You know that some Github projects have nasty things in them like Makefiles, and all that code and configuration requries tedious text editing.


Better participation. A lot more people know how to use GitHub's tools. It'll likely increase the amount of development, participation, pull requests, etc that ASF projects get. Does it matter that GitHub themselves is not open source?


> Does it matter that GitHub themselves is not open source?

It matters a lot actually; a lot of free-software/open source software are licensed that way because the projects themselves are ideologically predisposed.

While that does not hold true for certain (even large) projects like Linux, it certainly holds true for Apache (historically) and GNU.

To put it another way; if you found out GNU coreutils were hosted on Window machines using IIS web servers then you would probably consider that the people making the software (or, certainly those hosting it) are ideologically at odds with the project and are hypocritical.

So, I mean, you get to choose, if you go the Linux way and say "we are open source for pragmatic reasons" then there's no doublethink. If you say "we believe that all software should be free" while simultaneously forcing your users to contribute using closed source software on a proprietary platform then then you're not practicing your ideology, and worse; you're forcing that non-practice on your developers and users.


I think the ASF, being non-copyleft, is much less ideological than you're thinking. They're big on having rules to foster healthy community and community based decisions, but the license is really just a laissez faire BSD-ish license with some particular edge cases addressed. It's open-source but flexible and friendly to businesses with hybrid models. Something like GitHub that promotes input from people involved with other projects to participate is not a surprising move for them at all.


> the ASF, being non-copyleft, is much less ideological than you're thinking

Do you somehow consider copyleft style licenses "more ideological" than those which are not? That is probably more telling about your own views on licensing than it is on ideologies.

FreeBSD is not copyleft, and they work actively on eliminating GPL from their base. Debian mostly under the free software umbrella, but welcomes BSD code in their base.

Stallman is about as idelogical as you can get but has been known to argue for the MIT license in some cases. Just to mention a few examples.


Sure there are counterexamples, but I certainly don't hesitate to say copyleft does tend to be more ideological. In every instance I can think of, every serious discussion I've had with a company about project licensing that ended with anything other than the GPL, the concerns are virality, even when they like open-source (which is a practical concern overriding an ideological one), and a lack of any real interest in how it's licensed, just wanting it put out there with very few strings attached.


There is a very practical reason for choosing a copyleft license as an author. If you choose a non-copyleft free software license, then downstream users actually have more options than you do. They can take your code, add small modifications and then release the result under a non-free license. Even if you implement those features as well, you aren't playing on a level playing field. You have to support all of the software and they only have to support their small additions. The end result is that they can offer a product with a value-add at a lower cost than you can. You are essentially competing with yourself.

Of course you can decide not to release your code under a free software license, but that rather defeats the purpose of running a free software project ;-)

Looking at it another way, more permissive licences grant more options to downstream users than copyleft licenses. It doesn't make sense to argue that you are offering only copyleft licenses to benefit those downstream users individually (and I know of nobody who makes that claim). Instead, the argument is that it is better for the group that everybody has equal restrictions and can't use proprietary code to gain an advantage over everyone else.

This is actually one of the reasons why if anyone asks me to assign copyright for my side projects over for work, I'm happy to do so: on the provision that all of my work is licensed under the GPL. It barely matters to me at that point who owns the copyright (though to protect yourself further you should ensure that no one person has copyright over the entire work).

IMHO, although the choice to write free software at all is often ideological, the choice of using a copyleft license or not is usually pragmatic -- at least among those who understand why copyleft licenses are written the way they are.


Correct. We are pragmatic-focused. The communities asked for better integration with GitHub, so we provided it. (nothing to do with non/copyleft licensing regimes)

Mind you, we maintain private mirrors and have restrictions on some of the GitHub access/workflows (eg. ICLA on file, and 2FA required). We still need to track provenance, and must be able to operate independently, if it comes to that.

-- Greg Stein


Let me tell you, as a little data point, I use BSD licenses and I'm ideological enough not to want to host anything on a walled-garden now run by Microsoft.


> To put it another way; if you found out GNU coreutils were hosted on Window machines using IIS web servers then you would probably consider that the people making the software (or, certainly those hosting it) are ideologically at odds with the project and are hypocritical.

From a pragmatic standpoint, if (theoretically) running on Windows/IIS allowed the GNU coreutils project to save enough money to _actually further their goals_, I'd say they'd be foolish not to host that way.

When taking an ideological stance, there are practical considerations to consider. There will always be more and less effective ways to get one's point across.

I'm reminded, a bit, of this comic: https://thenib.com/mister-gotcha. Sometimes, you have to participate in the thing you're rallying against, because it's the most effective way to gain traction for your cause.


I think the point is that for hosted services the fact that Github chooses to use non-F/OSS software only really hurts Github's freedom since they're the ones limited by restrictive software licensing.

The fact that Github itself is not F/OSS matters even less as Github is the only user of Github's source code [1]; they're the copyright holder and unencumbered by any licensing restrictions.

[1] For GH's on-prem enterprise product this does not apply and has a much stronger case to be F/OSS since now GH is using copyright to restrict the freedom of others.


> a lot of free-software/open source software are licensed that way because the projects themselves are ideologically predisposed.

Does GNU audit every contributor to its libraries to ensure they only personally use open source hardware/software? If not, then I'd have a big issue the ideology stance of these organizations.

> So, I mean, you get to choose, if you go the Linux way and say "we are open source for pragmatic reasons" then there's no doublethink. If you say "we believe that all software should be free" while simultaneously forcing your users to contribute using closed source software on a proprietary platform then then you're not practicing your ideology, and worse; you're forcing that non-practice on your developers and users.

With that same rationale, it's also doublethink if every system that an ideologically-aligned FOSS isn't using FOSS based systems. And realistically, this is not how the market works and makes it extremely limiting to find contributors to maintain systems in longevity.

Kind of reminds me of the discussion around people going personally carbon-neutral. Are you going to realistically audit every interaction you have every day to ensure the level of carbon you are consuming? No, you look at the largest/most material areas that you can control and use alternatives there.


If the open source community is better served, and more free software is produced as a result of this move, then I see no hypocrisy. As in code, most things in life are a compromise.


It is definitely better-served. It was the communities asking for GitHub access/integration. So we provided it.

-- Greg Stein


> Does it matter that GitHub themselves is not open source?

That's for Apache to decide, but consider:

Git itself was written, by Linus Torvalds, because Linux (the kernel) was using non-open source version control at the time called BitKeeper, and the non-open source nature of it was causing increasing problems in the Linux developer community, both ideological and eventually practical problems too, caused by the licensing preventing some devs from building tools.

When Git was written and the Linux kernel devs jumped to it en masse, it was like a breath of fresh air.

BitKeeper has credit for many of the ideas found in Git before Git. (Aside: credit also should go to Mercurial for ideas). One of the things that made Git better than BitKeeper was people were free to build more tools on it, which is entirely due to the open vs. closed source licensing, as well as a general attitude of welcome vs. unwelcome towards those tools.

GitHub is different because it already builds on Git, and works with tools that anyone can build on top of Git.

But some things which are really essential to thriving developer community are locked away in GitHub, so it's still limiting what people can build with it. (For example, can you innovate on how Issues are handled? Somewhat, but not in every way that is useful.)

In some ways GitHub is like any other walled garden. You can't fork it yourself.


"Ideological and eventual practical problems too" is way too general.

The very specific flash-point was Linus throwing an unjustified hissy-fit defending Larry McVoy of BitKeeper being difficult about Andrew Tridgel (Tridge of Samba fame) "reverse engineering" the data traffic of Bitkeeper for inter-operability (really just sniffing client packets with wireshark or something).

Whether the name "git" pertains to Tridge or Linus himself who in retrospect decided he acted like a spoiled brat is still not known :)


There were more people having problems than Tridge.

I was there, and I was told in private mail by the author of BitKeeper that I did not have permission to use it, because of my work on respository analysis software that looked like it would get too close for comfort.

That's without any reverse engineering. I never used BitKeeper, or connected to the server, or read the infamous "help" text.

It meant I couldn't participate in kernel development in the same way as most folks.

I wasn't the only one, and that's what I mean by practical problems, not just ideological.


My apologies. I didn't want to gainsay your claim, but I've noticed I've been getting older [1] and things that happened "recently" and "should be common knowledge", are not, in fact, exactly that for everyone.

I wanted to point out that it was not a very loose "pragmatic and/or ideological" argument back then, but there were very specific actions and respected actors involved.

You obviously, being directly involved, are aware of the specifics, but it might be easy for a casual reader to place it in a "Ah, the Free Software people were ruining a good free thing even back then" context, whereas pretty much the opposite happened.

[1] and you too, by implication, for which my apologies as well.


Thanks for the apology, but no worries, it isn't needed :-)

That's interesting. I read your comment as making out that it was _merely_ one person (Tridge) who had a problem with BitKeeper and BK didn't deserve the flak, but I read your later comment as making out that BK did deserve it.


> (really just sniffing client packets with wireshark or something).

He connected to a BK port and typed "help". BK helpfully output a bunch of protocol help. He used that to implement something minor (I think for archival?) and McVoy wasn't having it.


Right, thanks!

It wasn't even "wireshark", but simply a telnet session indeed.


So the real problem was really the proprietary nature of the protocol - both in the sense that it wasn't properly documented, and in the sense that the author tried to use legal means to keep it secret.

But GitHub doesn't have this problem.


Just for the historial record: Bitkeeper is much older than both git and Mercurial, hence Bitkeeper can not be said to have taken ideas from Mercurial.

Both was sprung in large parts from Linus' description of what requirements he had in order to consider using a version control system for Linux.


I think the comment meant that git took ideas from Mercurial rather than BK taking ideas from Mercurial.


Git start before mercurial.


You're right.

They were both released extremely close together, and I don't remember why I associate Mercurial ideas as being something Git learned from. Maybe I'm wrong about it.


Maybe you are thinking of Monotone. It shares some design, such as referencing commits by their hashes and zero copy branching.

I think Linus said at some point that if only performance would have been sufficient for the Linux kernel, there would have been no git (but don't quote me on that).


They give up their autonomy in moderation - Github will now have the power to say "you are not allowed to contribute to Apache projects". This will lead to

a) Github will have leverage over the project.

b) They make themselves vulnerable to the outrage du jour. If an outrage mob forms against someone in the Apache Project, Github may kick them off their platform - they are known to have done so in the past. Apache will then have to decide what to do. The path of least resistance will be to just let them go.

People who anticipate such a course of events will have to live in constant fear of all their effort coming to waste and their community being pulled from them by an outsider. And people who anticipate such fear may never join in the first place.


To all you who think this idea is unfounded or dramatic, remember that this is the entire point of FOSS. Whether this idea holds water is irrelevant, it's aligned to FOSS and it's strange that they don't appear to care or have valued other things over it.


Even if it were true, this isn't the "entire point", or at least not to everyone. You're talking about communities that have diverse interests, and only a subset of those people are into avoiding all dependency on commercial software in the FSF or Debian sense.

"FOSS" includes Windows developers, Mac developers, iOS developers, Java developers, people working for bigtech corporations, and some bigtech firms themselves. (Some parts of these firms some of the time, anyway.)


it seems you confused commercial with non free software. Java may be a commercial product, but it is also free software and the FSF and Debian are happy with that. They do not reject commercial software quite the contrary.

That said, your point still stands.


It's likely that they care, but consider the provable improvement in contributions to open source and thus a demonstrable benefit to the community to be worth more than a theoretical possible eventuality that most would consider unlikely at best.


All projects that I have closely observed that went to GitHub for FOMO have NOT gotten more contributions afterwards. Among the real reasons are:

1) Every OSS project has its bureaucrats who contribute a modest amount of code but want to have an immodest amount of power. For these bureaucrats GitHub is like heaven: They appear productive, have power to silence discussions etc.

2) They work for someone or know someone who is associated with GitHub or in the Microsoft embracing strike force.

3) Legitimate reason: They want to show their employers a metric how much they contribute.

I'm pretty cynical about the current state of OSS. Major idealistic contributions are a thing of the past. It is all about attaching oneself to a project, get hired somewhere and then stop contributions.


Hmm based on this, the .org tld and the server providers can have control over the project too. Right? If so, we should use an open source distributed protocol.


For what it's worth, the conditions for termination from the Public Interest Registry (.org's registry) are slightly better than Github. PIR has a five-items list of possible reasons: https://pir.org/policies/org-idn-policies/takedown-policy/ while "GitHub reserves the right to refuse service to anyone for any reason at any time."


"Deems necessary, in its discretion ... to protect the integrity and stability of the registry" sounds extremely vague, though.


> Hmm based on this, the .org tld and the server providers can have control over the project too. Right?

No, pretty obviously not. For the domain, this has already been answered. For the servers, it's trivial to move your setup to a different hoster with no visible effects to the outside.

> If so, we should use an open source distributed protocol.

Like ... git and email? Yeah, we should.


Git and email isn't sufficient, surely you need a repository as well, a listserver in your implementation I guess?


A respository and a listserver are not protocols.

Also, with git you have a repository as soon as you start using it. If you mean something like a centralized repository server: No, you don't actually need that. You can serve a git repository from your workstation just fine for others to pull from. Or you can just move changes around via email.

Also, there are way more ways to use email than for mailing lists. You could also use it as a transport for machine-readable messages between git frontends or whatever.

In and case, I don't get what point you are trying to make.


It sounded like you were saying "instead of using GitHub (with git obviously), you can use just git and email". My comment was basically trying to say "without a repository how will people find you, how will they catch up on recent dev, how will they view past emails, etc., unless you have a central repository of some sort".

I'm not sure how "just give people direct access to my local git" fills that massive hole, so I assumed your mention of email included implicitly something like a listserve. Otherwise your suggestion seems entirely unworkable to me?

So one of us is missing something. Are you really proposing p2p git + email alone with no other infrastructure? Do you think that's at all optimal?


I'm not saying there is anything necessarily wrong with a centralized repository server or a mailing list, but I just wasn't talking about software, but about protocols, which can be used in that way or other ways.

Though it is maybe useful to distinguish "central repository that publishes the authoritative version" and "central repository that contributors push to", two functions that GitHub kindof purposefully confuses.

The former is what is useful for discovery and catching up, but doesn't need accounts/authentication (for the "public side" of things), it just needs a stable public identity and availability. The latter is technically completely unnecessary with git: There is absolutely no problem with merging a branch from a repository hosted on a completely different server than the authoritative repository. It's just a business decision of GitHub to build a closed system that requires you to enter into a contract with them in order to submit a merge request that is limited to branches hosted on their platform, to create an artificial network effect for their platform.

After all, the primary problem is not that people choose to host a project on GitHub. It's that they demand (or allow GitHub to make that demand with them as voluntary hostages) that you also host your branches on GitHub, or else you can't contribute. If I am a happy Bitbucket customer, or I happen to run my own git server, there is no way for me to submit a merge request to GitHub specifying a branch hosted whereever I happen to be hosting my git repositories.

But no, I was not suggesting any particular implementation, I was just pointing out that those two protocols do exist and can be used, in many ways, as a basis for decentralized Free Software development. And while p2p git certainly is not the solution to all problems, it can be a perfectly useful tool--and with some more tooling around it possibly moreso than what is practical today.


That is simply not true. Please take your conspiracy theories elsewhere.

Apache maintains clones of all our GitHub org's repositories. GitHub has no leverage over our repositories. We have a fallback mechanism for contributors to push to our server, if they deny GitHub T&Cs.

Apache has the support of GitHub and Microsoft, from the CEOs of both, and through the organizations.

-- Greg Stein


I don't think it's a conspiracy theory, these things happen all the time.

I'm very glad to hear of the fallback mechanism.


while that's true, a big project like apache could push gitlab instead, as that's _more_ free and open-source.


They could even consider HN darling and 100% Free Software Forge SourceHut, but it seems major FOSS brands are looking for maximising network effects (and associated lock-in) these days


As much as I like SourceHut using it now by org as big as ASF would mean a lot of troubles. I guess Apache, like Mozilla recently, was looking for product not semi working proof of concept.


Not to us here at Apache. We wanted the GitHub tools to be available to our projects. Why try and recreate all that on our own? Waste of resources. The ASF is for creating software for the public good; having a great version control tool website is not in that mission. We chose to leverage GitHub instead.

And yes, lots of our communities have been asking for better GitHub support (read: access to its tools). So we made it happen for them.

-- Greg Stein


And a likely increase of people posting "+1" comments on bugs...


> Do they think that they’ll get enough new contributors this way to offset the (more than slight) irony?

Absolutely yes. In another thread, someone mentioned that MediaGoblin is basically dead now. I went to look at their repo and it's hosted on Savannah. That definitely hurts involvement.


I'm kind of cynical about this sort of thing and I don't think the hosting really changes the quality of contributions. The amount, maybe, but you get a lot of crappy drive-bys that are about one particular but very enthusiastic person's pet peeves, which probably don't affect the wider userbase. The drive-bys require a lot of work to integrate for relatively little profit.

Mediagoblin is dying not so much because it's on Savannah but because basically there are other things that got ActivityPub before it and could replace it (Pixelfed, Mastodon, Write Freely). Nobody really wants to work on Mediagoblin now because the alternatives are pretty much all better.


> I'm kind of cynical about this sort of thing and I don't think the hosting really changes the quality of contributions.

Yeah. We've had this topic come up a number of times in the Wine project. We still use a mailing list and git-am patches for our contributions. We have a few hard-liner FOSS types who would strongly reject a GitHub solution (including myself), but a self-hosted Gitlab solution may be accepted. But in the end there just isn't much evidence that the change would be beneficial for the project. If you can't be bothered to figure out how to send an email or attach a patch to the bug tracker, are you really going to usefully contribute to Wine? It's an active topic of debate, but just being "easier to contribute" isn't clearly a good thing.


You would be surprised by how big the entry barrier is when you have to use a mailing list to do anything.

Just as an example, when inkscape project migrated to GitLab [1] , I've noticed something that was not optimal in their CI definition and contributed a change right away. In a "mailing list" based development, that CI script would not be visible. Most projects even hide their internal tooling.

Also, unless you were born in a certain age and had access to internet since X, there is a good chance you never have been exposed to Mailman or how the mailing list flow works.

[1]: https://gitlab.com/inkscape/inkscape


Hi jordigh! (Historically) lead developer and co-founder of MediaGoblin here (and ActivityPub spec co-author/editor, so I have more than a bit of insight/bias I think).

You're not wrong. And neither is the parent post! MediaGoblin is in "unofficial retirement", but that's because we made progress unexpectedly in other ways, which is good, but not where we realized we were going. Allow me to lay out what happened and what the history is here.

About four (or was it five?) years ago, MediaGoblin was still a very active project and a lot of it worked, but we still didn't have working federation support. At the time we were looking at a lot of different protocols and it wasn't clear which approach was the right one, but Evan Prodromou had written up the Pump API document: https://github.com/pump-io/pump.io/blob/master/API.md

Even though pump.io didn't have the highest uptake, it seemed to have the cleanest design and addressed many issues that OStatus had. Evan did StatusNet which is what's also now called GNU Social, and has done more work to advance the federated social web than anyone else, and given how clean the design looked and that I trusted Evan, I thought this was the right approach. So we used the funds from the second crowdfunding campaign we ran and hired Jessica Tallon, who had written PyPump (and understood the practical details better than me at the time, I was learning as I went), to do the implementation. We got as far as getting MediaGoblin and Pump.io to talk to each other and pump.io clients to even work on MediaGoblin.

But there was still a problem... nobody else was using the Pump API but our two projects, and at this point all these different projects on the fediverse were speaking different protocols (and sometimes not even compatibly speaking the same protocol)... what I would call in talks as a "fractured federation". I heard Evan Prodromou mention he was going to be co-chairing the W3C Social Working Group and I asked that Jessica and I could participate, and we were brought in as what are called "invited experts". At this point Erin Shepherd had transformed the Pump API document into a prototype W3C spec document called "ActivityPump" and that was the direction Jessica and I got pulled in to.

There were a lot of smart people in the group, and my assumption was, they probably all knew what they were doing and I told Jessica "we can just show up for an hour a week to make sure they're on track and doing what we need and then we can focus on MediaGoblin". I didn't know the phrase "revolutions are run by the people who show up" but I certainly do now... Jessica and I got drawn in as co-editors of the ActivityPub standard. We had raised enough money from the second crowdfunding campaign to pay Jessica for a full year (I didn't take any money from that campaign) but we stretched it out to two years by Jessica and I contracting for Open Tech Strategies part time (great people, btw). This was helpful because when one of us was working on standards stuff, the other person could do some work on MediaGoblin as a project, and there was a lot to do.

But as time went on and deadlines became more urgent, standardizing ActivityPub grew more and more in time consumed. Eventually it became my full time job; I would work 40-50 hours a week on ActivityPub and do 10-20 hours of contracting on the side to pay the bills. It was clear we were doing something important and there was a real opportunity.

But ActivityPub grew to three and a half years of standardization work and as I said, we only could stretch out paying Jessica for two, so she had to find paying work and it wasn't possible for us to split our time to manage both. In the meanwhile, even though all this stuff was happening for MediaGoblin, I found less and less time to work on the project. Even worse, Gitorious (which we had previously been hosted on) went down, and we were unsure where to move to. A community member volunteered to do the work to move us to Savannah and we took it. MediaGoblin wasn't using Gitorious's issues/merge request tools anyway; the way people would make contributions is make a new git branch, publish it anywhere, and then link that branch on the issue tracker where we'd do the code review and then eventually we'd merge it in. In that sense we were already using git in a more distributed manner (the way that git was intended I'd even argue)... but actually I do think we lost something in the move from Gitorious to Savannah. What we lost is that many people didn't know where to host branches, and Gitorious (along with many other such services) tend to offer a one-click easy process to fork, where you don't have to learn or debate over how/where to host things if you don't already have a preference. Our server infrastructure also languished... we previously had some volunteers helping with the infrastructure but they ran low on time, there was a server migration that went badly (it's still in a bad state tbh), spam filled up our wiki and trac instances, and it was all a huge headache that I didn't really have time to deal with. And I wasn't there to help steward the project the way I used to... I did appoint a co-maintainer (Breton) who did great work but I guess I did help drive a lot of the energy for the project and so when I stopped working on it actively, the community languished. We went from dozens of active contributors to practically none over the course of ActivityPub's standardization.

It wasn't clear that it was worth it; towards the end of ActivityPub's standrdization it looked like we wouldn't even make it and I thought I wasted years of my life. Then Mastodon picked it up, then Peertube, then etc etc and we suddenly had dozens of ActivityPub implementations. It turns out it was worth it, and finally we had a fediverse that did talk to each other. It turned out MediaGoblin did make a large contribution to the federated social web, but it wasn't in the way I expected... it was a driving force, rather than the project people ran.

Still, afterwards I came back (and with a more strong sense of how finite and fragile time is than ever) and I had to debate: should I pick up and run full swing with MediaGoblin again? The project could pick up and with effort, merge the languishing federation branch, I could try to drum up excitement in the community again, we might even make it.

But the webdev world shifted and so did I. IPFS and Webtorrent didn't exist when MediaGoblin started, and Peertube did the smart thing of integrating those into their project and it felt like they handled our ideas better than we did there. There were also all these other projects (Pixelfed, Funkwhale) which, while not delivering all the media types in one package (why the heck not? I still don't understand that) which seemed to be doing the same thing we were and actually were already federating... with the protocol we built for our own needs with MediaGoblin no less! And web applications aren't typically built as request-response type systems any more (and I for one was tired of it and had become disillusioned for my interest/belief in for Python being a great asynchronous language), and I just didn't feel excited about the codebase anymore. What to do?

I had another idea, and I called up several of my closest free software friends to make sure that the path I was suggesting wasn't an awful one. The main success I have had turned out to not be in the applications I built but in the way I showed how to grow distributed systems, and I now understand even the deficiencies (but how we can improve building on the base we have) on the current federated social web. So that's where Spritely came from, and why I'm building it as a series of demos (more here:) https://dustycloud.org/blog/spritely/ (first documented demo here:) https://gitlab.com/spritely/golem/blob/master/README.org

So what lead to MediaGoblin's "unofficial retirement"? I think that it's both true that a) the standardization effort of ActivityPub, while done for MediaGoblin, accidentally lead to a loss of energy in MediaGoblin's community b) there was a falloff in code/infrastructure hosting and other challenges related to that c) other projects picked up on what MediaGoblin was doing and arguably did it better, using ActivityPub even, and finally d) I still believe there are serious problems and deficiencies in the current federated social web that are addressable, and so I started the Spritely project to document and demonstrate a path forward there.

You could focus on just any one of those, but I think the story is clearest when it's told all together.

Anyway, it's free software! If someone is interested in revitalizing the project and community, I'm still interested in that happening.. maybe reach out to me and we can figure out how to continue. I'm easily found: https://dustycloud.org/contact/


Would you consider copying this to a blog (or maybe the Mediagoblin website)? Seems like a lot of valuable background/history that I'd hate to see buried forever on HN.


Thanks... I'll post it to my blog later today!


What @anderspitman said -- would be great if this were a post on your blog.


> basically there are other things that got ActivityPub before it and could replace it (Pixelfed, Mastodon, Write Freely). Nobody really wants to work on Mediagoblin now because the alternatives are pretty much all better.

Uh, maybe, except that none of the things you list actually do anything close to the core promise of MediaGoblin. MediaGoblin was going to be the libre Flickr/YouTube replacer. Nothing focusing on federation networks has ever gotten even close to that, so it definitely wasn't beaten by those. At best now there's Piwigo, which by the way, is on GitHub.


Peertube for YouTube, Pixelfed for Flickr.


Except that pixelfed...

1) Is not even close to Flickr. It's way more like Instagram, which is an entirely different and unrelated thing.

2) Barely has docs. Everything just says "to do".

3) Doesn't even have a website! pixelfed.org just says "coming soon" since last year. It has 400 different language translations though, all telling you the same nothing.

How is that in any way superior?

Flickr lets you organize your photos and videos together. Peertube and Pixelfed don't do that. You keep suggesting one-off social sharing feed services, but those are entirely orthogonal to what Flickr and the now defunct MediaGoblin provide.

Your suggested replacements do entirely different things than what they're alleged to be replacing, and at least one of them doesn't even tell you how to run or use it.


Alright, everything sucks, there's no hope, never mind.


MediaGoblin was the hope. It sounds like it might not have lost its developers if it hadn't gotten massively sidetracked by all the social networking federation stuff that was honestly always completely off brand for something that isn't a Twitter clone. You can't do anything about that now, of course, but putting it in a more accessible forge than Savannah might be a good start.


Well, go ahead:

git clone https://git.savannah.gnu.org/git/mediagoblin.git

Put it on github and see how everything starts to get better.


About the only thing Pixelfed has in common with Flickr is "it's about images". It's useless for many of MediaGoblin's use cases (which is ok, because it wants to be something else)


I never even heard of Savannah. And after seeing the website i am glad they will be on github soon.


Well, they've been using Jira forever.

Usually when people say this they are referring to either Gitlab or Gitea. Well, it's not like all of Gitlab is fully open source, it's just Open Core, but I'm sure that's a sizable amount of the code, but I would imagine there's a decent chunk of gitlab.com source code that's proprietary (I'm sure somebody will quibble with this in a following comment, but the original argument is an argument of purism of being free and open).

Gitlab CE and Gitea doesn't solve the hosting issue, though of course Apache could probably pay somebody to manage that, but then you've introduced extra overhead for collaboration that most people aren't that inclined to solve.


> Well, it's not like all of Gitlab is fully open source.

I think this is not a fair assessment, GitLabs community edition is fully open source and is a full product. The fact you can add proprietary elements for a licensing fee (which are also "open" in that they are readable, debuggable, etc;) is not at all the same as hosting on a platform which is entirely proprietary.


GitLab Community Edition is being used by major FOSS communities like Debian [1], GNOME [2] and Freedesktop [3].

[1]: https://salsa.debian.org/help [2]: https://gitlab.gnome.org/help [3]: https://gitlab.freedesktop.org/help

While gitlab.com is based on the Enterprise Edition version, it's 100% possible to host into FOSS only if you wish.


I say this as somebody who has used all three of those instances -- it is an inconvenience to create an account for each project you send patches to (or even just file a bug).


Doesn’t oauth help here?


1. Who is the OAuth provider?

2. What does the registration process look like?

OAuth helps with federated authentication. It's not federated registration.


1. Whoever you want, as long as the remote service supports them, github is common. So is the official gitlab.com

2. Click “sign in with provider” log in as normal, or just get logged in if you have a cookie.

3. These are functionally identical, it’s 2 clicks to create an account on gitlab if you use an external identity provider.


I feel the same. If Gitlab released the community edition as MIT license (not sure what the license is offhand) with no proprietary version, there's nothing stopping some 3rd party from creating proprietary versions/extensions of their own product.


https://gitlab.com/gitlab-org/gitlab-ce

I’m pretty sure it’s MIT. This page says it is.


Yes. BSD-derived licenses do not stop someone from distributing a proprietary version, while the GPL does (although you can still charge for your version even with the GPL, you just have to distribute your source with the package and make it available under the GPL).


You are also not being fair to github then - you can run Github Enterprise Server and it is also "open" in that it's readable (non-obfuscated), debuggable, etc... It's basically a snapshot of what they run at github.com with added features for enterprise (like AD integration). You can't distribute the code.

How easy it is to actually do that depends on your ruby skills and your wallet.


I don't think that people are drawn to the specific github implementation (code), it's that they're drawn to github.com the domain where you can't do what you propose.


When we moved PHP to git we made sure we have our own infrastructure and keep our control there. Reality however is that people love doing "pull requests" on the GitHub mirror and use git.php.net only as push target.


And if anybody looks at git.php.net and thinks "well ugly, no wonder we anybody isn't using it" that's true. And moving to self-hosted GitLab was brought up, but won't solve it. We still have many pass-by PRs from people who wouldn't submit if it was on GitHub.


I won't argue that GitHub is better. I simply like the experience of contributing on GitHub: my pull requests are available for all to see, it's familiar, it works well enough. Using GitHub is also one less thing I need to learn, which is nice when I'm feel pretty maxed out on learning new things, which feels like most of the time these days.


It lowers the barrier to entry by using something people are already familar with.


I think it has more to do with the ethos of Apache vs., say, Gnu. Namely, Apache tends to be much more permissive and not really view commercial, proprietary software as a "bad thing" like Gnu tends to.


It costs money, dude. Lots of people have all these philosophical opinions about what FOSS projects should do but few people give money for infra.


And we save money by relying on GitHub's excellent tools, rather than wasting it, trying to recreate or stand up a similar platform. The communities wanted closer integration with GitHub (what they knew/used elsewhere), so we made it happen.

-- Greg Stein


I wonder if there are plans to mirror GitHub’s supplemental content (issues, etc.)


I’ve heard that GitHub makes that intentionally hard. Have you seen a repository that comes from Google Code? All the issues are always messed up that way.


Probably because UX and adoption trumps FOSS purism.


If a non-profit buys food from a for-profit grocery store, is that ironic? Just because a project is open source doesn’t imply any requirement that all of the products and services used for that project must be.


I guess GitHub right now is like Facebook was in 2012, everybody is on it, it's a super vibrant community, it's backed by a huge amount of money, it locks you in (at least with wiki, issues, URLs).

And a second guess is that it will also do what Facebook does, getting out of control by trying to monetize on its monopoly.

It's just so mega ironic that the whole Open Source movement collectively decided that for a little bit of convenience they're more than OK to let Microsoft host their stuff and to sacrifice so much freedom (and especially the URL which does the lock-in to GitHub).


Lucky for us GitHub unlike Facebook don't rely on selling advertisments to make revenue. Plus git is decentralised. You can migrate your entire code anywhere you want. As far as issues are concerned I am pretty sure competitors like GitLab supports exporting them via GitHubs awesome API. It's not as bad as make you sound it. Things could have been much more worse.


I can migrate my code, but not all my consumers which use "github.com/my/package" as the import qualifiers all over the place.


I assume that's a reference to Golang? I gave up Go after one too many times dealing with its terrible package management


GitHub doesn't provide you with any standard way to migrate the related metadata, but unlike Atlassian's Bitbucket, which doesn't make everything accessible with their APIs, it's possible to get all the data with GitHub's API (as of today).

If you use GitLab importer tool, we migrate everything from your old project. All issues, all pull requests, and we even migrate inline comments inside a pull request. All it takes is one OAuth authorization and the pressing of a button.

When inside GitLab, we even give you an Export zip file, that you can import in your hosted version of GitLab (or use any alternative, that can read our export file format).


That's exactly why any package versioning that hardcodes URLs to packages is broken by design. URL is where. A package is a what.


Doesn't setting up a mirror solve the issue ? https://help.github.com/en/articles/about-github-mirrors


Facebook had awesome APIs too, I wrote a Facebook App (FBML anyone?) where you could book a table at a restaurant and automatically create a event and invite your friends, until they didn't.

Not sure it makes sense to point out that things could have been much more worse, that is basically always the case, independent of how bad it is.


What I'd really love is for federated project hosting. Consider: why does a GitHub pull request have to be from one GH repo to another? So long as it has read access for the source repo, it shouldn't matter where it is. Now, it also keeps track of changes in the source, so that it can update PRs etc. But surely distributed notifications are also a solved problem?

Ideally, I should be able to do PRs and link to bugs and commits in comments etc, and it's all federated and "just works".


ActivityPub is the distributed notification system you talk about and yes it's a solved problem. It's already used for Twitter and Instagram like content, see Mastodon and Pixelfed.


Note that the wiki can be cloned and moved to any static file host (just a bunch of markdown files):

    git clone https://github.com/github/VisualStudio.wiki.git


Are there any good in-repo issue trackers that integrate with GitHub? Like a bot that listens for changes on a GH repo and commits the information to git in a format that could be migrated to a different service down the road if necessary.


I didn't find any link to their GitHub profile in the blogs or press releases:

https://github.com/apache/


I wonder how long this name was reserved and by whom.


The ASF has used this for years. Except for a few projects that have been piloting using GitHub it was mainly used as mirror repositories for repositories hosted at the ASF.


GitHub is generally pretty loose when it comes to reassigning accounts owned by squatters.


Although the Apache Software Foundation is considered a squatter themselves by some folks.


why is that?


I suppose it is because of the feud between OpenOffice (Apache) and LibreOffice (Document Foundation).

For years, OpenOffice has been Apache's most downloaded project[1]. But it is only because of the past glory of this name when Sun promoted it, and by confusion with LibreOffice.

When Apache was given OpenOffice by Oracle, they let the code rot, with just a few dozens commits a year. IIRC, their strict access policies lead them to reject the initial cooperation offers from the Document Foundation. So the Document Foundation "dropped" OpenOffice and created the fork LibreOffice (from Go-OpenOffice). Circa 2011, there were many angry post about this.

[1]: The home page of "Apache OpenOffice" claims: "Over 3.2 million downloads of Apache OpenOffice 4.1.5 ".


I'm pretty sure Tomcat and the Apache HTTPd get more downloads than that, it's just harder to measure as they usually come from package repositories for various Linux distros or homebrew.


When Apache published OpenOffice 4.0, there were about a million download per week, for a total of 85M downloads in 2.5 years[1].

Fortunately, that is no more the case. OpenOffice is in maintenance mode (last minor features were 5 years ago). So, even if the project seems alive at first glance, its user base is thinning out.

[1]: https://blogs.apache.org/OOo/entry/apache_openoffice_in_2013...


I think they are saying that an apache tribe should have that username.


The various Apache tribes.


I had not thought about the naming of Apache the software despite it describing so many "Native American people inhabiting the southwest United States and northern Mexico." I also realize now the use of a feather as the logo.

Previous HN discussion on the Apache software relationship with the Apache people:

Apache Rewrites History: Why is it Named “Apache”? [2013]

https://news.ycombinator.com/item?id=5536134


I participated in that discussion, and it was a scummy rewrite of history then. As far as I can tell, they have no relationship to any of the various Apache tribes.


Over 1600 repositories. Sheez apache is big.


And they are still importing it seems. It shows 1761 to me


In 2009, I brought the idea of using github up on the members@ list and was called a troll. Agreed, at the time, the message was trolling...

https://imgur.com/a/JPQtCQ4

I brought it up again later in 2010 and 2011 on the members@ list too. Lots of discussion with the general consensus that it wouldn't happen because it isn't an 'open' platform.

I even wrote a blog post about it in 2011, which was also discussed on members@ and in the blog comments...

http://lookfirst.com/2011/11/contributing-to-open-source.htm...

Even though it took 10 years, I'm glad to see it finally happen.


Nice effort. Some people are slower to adapt than others... as you say in your blog post, GitHub (or something like it) was always going to be much more attractive for people to contribute than the "old ways" of mailing lists... but there's always the risk you'll invest on a platform that will be dead in 2 years. So being conservative is necessary for larger organizations.


Is it Microsoft's acquisition that's won Apache over? (Though my first thought was 'wow, in bed with the devil'.)


Congrats.


For many years now, it's felt like Apache software foundation is where projects go to die.

It's a shame its taken so long, but I'm glad to see this news - it's pretty much guaranteed to increase participation in Apache projects.


What do you mean to die? Projects like Kafka, Airflow and Spark are widely used today and the teams are constantly releasing new features.


This has been my observation as well - projects flourish under the Apache umbrella. Parquet and Arrow are also widely used and will probably continue gaining importance. A Rust big data engine, similar to Apache Spark, was just donated to the Apache Arrow project (it's called DataFusion: https://github.com/andygrove/datafusion). From what I can tell, Apache already has a lot of important projects and is adding new ones that will continue to grow.


Apache has a few big hitters, yes, but for dozens more it's been more akin to a tomb than an incubator. I really believe a large part of the reason for that has been their dogged determination to stay off GitHub.


what percentage of repos on github would be legitimately considered "active"?


From the ASF press release:

> In February 2019, the migration to GitHub was complete, and the ASF's own git service was decommissioned.

I'm confused. https://git.apache.org/ doesn't look decommissioned.


> I'm confused. https://git.apache.org/ doesn't look decommissioned.

That's a read-only mirror service.

OTOH, https://gitbox.apache.org doesn't look decommissioned, either.


I am quite happy about it. There are large number of very useful libraries and infrastructure projects (along with J2EE inspired frameworks which I have decided to ignore) which remain essential for Java related software development.


So... I guess this April fool's Jira from a few year's ago might actually come to fruition?

https://issues.apache.org/jira/browse/INFRA-7524


I hope projects will get rid of JIRA now!


This is unlikely since not all projects use git and they seem to like to have the same workflow everywhere


I recently moved to a company that uses Jira. What kinds of problems should I be looking out for?


If it's the cloud hosted version (I can't speak to their on-prem software since I haven't used it in a few years), watch out for the nonsense that is having two separate markup languages, used it two different contexts. The original "Atlassian markup" (made famous by Confluence) is used, as best I can tell, in issue creation and descriptions, whereas some kind of Markdown hybrid is used in issue comments

Be careful if you use subtasks for issues, which may sound like a perfectly reasonable work breakdown idea but subtasks are snowflakes in Jira and become very hard to have visibility into, assign points to, etc

The rest of the rant I'll omit, since it's more rooted in my personal hatred for their products and has less substantiation


We use Jira in a very small company and I avoid it when possible. Wrote a command line hours tracker, that saves 1/3rd of my usage. Maybe I'll write an issues CRUD application some time as well, but I think going further down that rabbit hole costs more time than it saves.

My previous employer was a little larger (some 40 employees) and I can say that Jira solves nothing that we couldn't solve much better with text files parsed by Perl/Python/sed/awk, sometimes having a little bit of editing help (such as syntax checking on :w) from Vim/Sublime plugins (both had users that maintained their plugins).

Maybe Jira is meant for larger companies? There is a ton of special characters that trigger all sorts of crap: square brackets, double curly braces, ampersands, semicolons, colons, underscores, asterisks, at symbols, backslashes, octothorpes, pluses, carets, tildes, double question marks, pipes, and even hyphens. It takes about num_characters*20 seconds to escape a regex so it doesn't format or hide parts of it (it doesn't help that you can only check by saving, a lot of formatting does not show up in the "visual editor"), but there are probably a lot of features there that one could be using.


I'm at a company which we have hundreds of Jira users, and for us, it works quite well.

It's really about how the team/department uses Jira, how it's setup, and the overall process.

My last company we were ~30, and it was a total mess (although we were only ~3 developers). It's really about how it's setup.


What is the difference that makes it work for you now where it didn't before?


- It's kind of slow (page load/interaction wise) - There are ~5 different places where every setting could be and it's always the last one you look

Apart from that, for internal company use it's been one of my favourite PM tools I've worked with. For open source with open communities, it's just too much overhead to manage.


Be careful when writing longer comments or creating issues with long descriptions. Your work is not persisted, so in case of an unfortunate event you lose all your input. This happened to my colleagues several times and caused a lot of frustrations.

There's a famous issue dating all the way to 2006 for this: https://jira.atlassian.com/browse/JRASERVER-9292


Issues with JIRA are mostly seen as you use it for different companies and projects. Everyone sets it up differently. It is like Excel for ticketing. I didnt care much for Trac in comparison but I can appreciate that the UI always remained consistent.


There is some kind of issue with ASF's JIRA not being indexed by search engines, which makes discovery of problems more difficult than it should be. ASF gets their JIRA hosting gratis so I don't think they have much recourse.


And replace it with what?


GitHub issues / pull requests / projects / wikis... the standard tools GitHub offers to manage projects.


Assuming he's talking about Github projects which is like a simplified Jira.


When i see a person solve a problem with a Github repo, the first thing i try to know is that: What's the problem he's trying to solve.

Then i ask myself: OK, the problem is real, but what about generalizing it ?

Tons of OSS projects focus more on generalizing solutions, not problems. And it's the problem with OSS.


I found this title very confusing. I thought it was the equivalent of their non-profit being absorbed by GitHub. Meaning, the corporate structure."

Why don't they just say that Apache will officially host their repositories on GitHub moving forward?


Because the writer, Bryan Clark, is Director of Product for OSS Maintainers and is trying to contextualize the move to feature the company he works for, not the code thats been around for far longer.

"apache moves to github" makes it sound like apache made a rote and boring switch, which this sort of is. They just as easily could have moved to sourceforge or gitlab. joining a team has implications that github (c) works together with open source. while it certainly hosts a lot of it, github isnt open source.

Headlining this as your "open source community" also helps avoid the uncomfortable issue of an open source product moving to a completely proprietary system for CI and development.


> while it certainly hosts a lot of it, github isnt open source.

Yep. This was something that surprised me

https://github.com/github/markup/issues/1266#issuecomment-48...


The ASF press release (https://blogs.apache.org/foundation/entry/the-apache-softwar...) title is better:

The Apache® Software Foundation Expands Infrastructure with GitHub Integration


I agree, also find it confusing. I thought this meant moving from JIRA to GH, but I suppose this announces something that users and contributors of Apache projects wouldn't have thought of; that the was an active decision to move to GitHub.

The move is confusing also because I thought Apache were migrating to their own git server, I think late last year or early this year.


They just finished migrating to Gitbox, but GitHub mirrors have been used for a long time and some projects have been using pull requests more in lieu of reviewing patches on JIRA.


And the 3 dozen GitHub to Gitlab migrants are trying to find ways to come back while still saving face.

I have personally seen people who, for example, not only use powershell (on Linux of all places) but also build tooling around it and file issues on the GitHub repo. The same people left GitHub when MS acquired it.


So the "Open Apache Graveyard" is moving to github, for better visibility, interaction, and where the users are.

Thats a good move. Although most of the software from the "Open Apache Graveyard" like OpenOffice and others is kinda dead already.

Just my very biased IMHO.


That's silly. Most projects on Github are "kind of dead" too.

It's not like the ASF doesn't have active projects too.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: