Hacker News new | past | comments | ask | show | jobs | submit login
Webwatch (github.com/jgrahamc)
105 points by jgrahamc on Oct 26, 2015 | hide | past | favorite | 57 comments



crontab -l | {cat; echo " * * * * * if curl 'https://mysite/' | grep -q mysubstring; then echo 'found it'; fi"} | crontab -


I've watched this command change repeatedly as you've been editing it to make it work.

This is a good example of why I didn't do this in the shell.


By looking through the git history of your project, it doesn't seem you got it correct the first time either. I don't think gluing "UNIX tools" together biggest strength is "make it work fast & on the first try", but "have independent tools that does 'one thing' well".

In relation to your tool, I think curl provides very many more features that are easily accessible through command flags than the limited subset of HTTP capabilities you expose (for example, basic auth or different set of headers). The same argument goes for mailing, setting headers or such.

With that said, tools that does one thing and does it well are the ones that gets used, personally I'd just prefer it to be function in <your-shell> instead :)


Yeah. The 'git log' really shows all the changes I had to make to the README. Oh, and an error message.


I mean, a tool can be really useful (I write tools this size all the time) but some of them needs tweaks forever. I just think some 'tweaks' are already solved by other projects, that's why using already written tools that are somewhat UNIX-y sounds like a good idea to me. That's what I tried to say; of course I don't want you to write a 100% complete program in the first commit, that would make everything I write look really bad in comparison. Just be prepared for that pull request that lands basic auth in your project, and the next PR after that :)


Oh, here's the programmer who never makes mistakes.


I dont know how to view or edit comment history but I never wrote anything in this comment about sending an email.

You could replace echo with sendmail to do that. Sorry if my point came across as callous to you


No need to apologize. There's basically always a way to do it in the shell.


Nice idea but it needs work. Firstly, and most importantly, any open source project lives and dies on it's documentation. Without a basic guide to what the thing even does no one is likely to to use or support the project. Give some love to your README.md file. How to use the project would be great.

Secondly, at the moment you're just doing a straightforward string comparison on the <body> of a page[1]. It'd be more useful if I could define something like a DOM querySelector or a regexp. It'd also be useful to look in the header at that page title.

[1] At least, I think so. I've never used Go so that's just what I gather from reading the source.


This is a really short little program I wrote for a quick need I had. I added a simple README. There are a ton of ways to improve it (regexp, DOM walking, automatically figure out MX, ...); if people want to do that I'd be happy to take PRs.

I tend to default to "stick it on Github and see if it helps someone else".


That's a fair comment. I just figured if you were posting it to HN you were looking for feedback.


Happy to have comments and even PRs.


If anyone is looking for something similar that runs in the browser i recommend this two extensions:

Chrome: https://chrome.google.com/webstore/detail/page-monitor/pemhg...

Firefox: https://addons.mozilla.org/pt-br/firefox/addon/check4change/


I assume I'm jealous for a project that brings nothing new compared to so many other solutions and still grabs 76 stars (as I write this). It seems, after all, github stars are another way to say "I'm popular" and not so much that a project is good.


That is a pretty rude comment and I would def. argue it reflects a pretty narrow view of the world. I think the project is alright and it looks quite useful if you need something to curl a site, check something, and blast an e-mail (essentially your own ITTT).

To the point I'm jealous for a project that brings nothing new compared to so many other solutions, I suspect the author of the program needed to call up a website check for an event and get notified; s/he probably found this to be the motivation for building this much more than getting github stars. Other people found it useful as well, and maybe it is easier for people to grep this implementation and build on it than other crawlers.

Most broadly, bitcoin combines a lot of well understood and older technologies into something completely knew. It seems this was your gripe, the project didn't do that. I just want to point out complex coordination and reorganization of current libraries/practices/technologies can be quite useful, novel and interesting.

edit: I actually concur with the above post a bit more now. I do think things done in Go get a but over hyped and if this is what parent was referring to, I suspect s/he was correct even if a bit prickly in expressing it.


I'm surprised this is popular. It was just a quick thing I wrote to solve a specific problem that mattered a lot.


In similiar vein and quite easy to run locally https://thp.io/2008/urlwatch/


Or Specto.


Is there any perks to pass arguments like this? `-url=http://cloudflare.com`. I was thinking the right way was `--url http://cloudflare.com` or `-u http://cloudflare.com`.


It's using the standard `flag` package that comes with Go, as for why `flag` parses args this way I don't know.


The go devs are aware of it, and adamant that this stuff is fine and they don't want to make the flag package "any more complex" since it's so easy to install a different one (nevermind that of course people are going to use the builtin one...). I find this absolutely ridiculous given how nonstandard it is in today's shell scripts; -flag is supposed to be interpreted as -f -l -a -g or -f "lag" depending on the -f argument.


i was thrilled when I learned that flag will also do --option=value format as well. it might even do --option value too - test it?


Tell that to Oracle. https://i.imgur.com/fQ6pQLn.png


There is a startup for that https://monitorbook.com/


And it was: "Crafted with <3 in San Francisco"

so there is that


I especially love the feature tick "Push Notifications (coming soon)" as a reason to go for their higher tier subscription.


There's quite a lot of edge cases that can be triggered when fetching HTTP responses. Perhaps a small test suite would be beneficial in order to attract new developers that don't feel like breaking anything? (-:




love changedetection. Using it for years!


I've built the same thing using NodeJS a couple of weeks ago, with phantomjs support (javascript execution), mandrill (emailing) & and some other nice options: https://github.com/mgcrea/node-web-watcher


What string? Is it a webpage modification or just a whois modification? What exactly is it looking for?


I assume that this program kinda works like "curl <page> | grep <string> && mailx -s 'Match' <email> <<< 'matched'".

Useful when you want to periodically check if a page changed - I already used a similar thing to get concert tickets before anyone else.

That being said, I feel like a browser extension might be more useful than a command line script, for this particular use case.


A browser extension is handy, but requires your browser to be open in order work. A script on the other hand can just be thrown up on a server and forgot about.


Its looking for a string in the page HTML body. https://github.com/jgrahamc/webwatch/blob/master/src/webwatc...


Someone need to make a tool to monitor the documentation for change.


What about the absence of a phrase? I would like to be able to do

  webwatch \
    -url=https://example.com/privacy/ \
    -warnmissing="never received a National Security Letter" \
    -from=me@example.net \
    -to=eff@eff.org



I understood your point, but you might be better received if you responded something like the following:

"That's a great idea! I have no personal need for such a feature, but if you do and are able to submit a pull request, I'd be pleased to merge it."


Off topic : Is it ok to re-post an ignored article [0] the very next day, just curious, not complaining :)

[0] https://news.ycombinator.com/item?id=10443814


I reposted because I got the following email from HN:

    Hi there,

    https://news.ycombinator.com/item?id=10443814 looks good, but didn't
    get much attention. Would you care to repost it? You can do so
    here: https://news.ycombinator.com/repost?id=10443814.

    Please use the same account (jgrahamc), title, and URL. When these match,
    the software will give the repost an upvote from the mods, plus we'll
    help make sure it doesn't get flagged.

    This is part of an experiment in giving good HN submissions multiple
    chances at the front page. If you have any questions, let us know. And
    if you don't want these emails, sorry! Tell us and we won't do it again.

    Thanks for posting good things to Hacker News,
    Daniel


I got the same mail and indeed my submission went from no attention at all to stay in the front page for a while. I figured out that maybe it had to be with posting time, maybe the email is sent when it's a good time to repost? Or the first upvote is crucial?


Interesting that the process needed you to repost it for the mods to boost it. Seems like they could have just fiddled with it without you having to manually interact with it.

I can't help but wonder what the logic is there.


Also interesting that HN is moving (has moved?) toward a curated site. HN asks for reposts of things they deem good. They also adjust downward the score of many articles. (As can be seen through large jumps on sites that track article ranks, some of which will be automatic from the flamewar detector, some of which is likely manual.).

It seems like we're reaching a "web 3.0" which uses users to do the expensive bit of an intial sift but then the site admins edit/curate that into their own vision.

We're moving away from user driven content, back to curated content with user-sourcing.


Web 3 or not, I'd see it as an extension of user sourcing, where users have various levels of moderation powers. I would guess these HN emails (I also received one recently and duly reposted) are triggered by some count of admins voting up unloved posts, maybe from a list filtered by a user karma threshold.

As Jeff Atwood says about StackOverflow, it should be possible for a sufficiently privileged user to do just about anything staff can do.

Not really a new concept as /. had the notion of metamoderation, but a richer model with multiple levels of user.


So Slashdot has survived long enough to be on the vanguard again!


Could you please make this legal in the US by honoring robots.txt and scanning any links to the ToS for words forbidding "automated access", "crawling", "spidering", "polling", etc.?


Hey, jgrahamc neat tool, could you please add bins to repo? I know, I know I can compile my self. But not everybody has luxury install go just to try...


No. I really hate adding binaries to git repos.


You don't need to add it to the git repo itself. You can create a release in Github and attach binaries.


It's like self-hosted Google Alerts for one page.


I was going to suggest Google Alerts. It works really well!


Then the description should have started like this: Self-hosted Google Alerts in Go!

That would of saved me a couple of minutes


Useful tool for what is a very common task, nice work jgrahamc!


The hard part is to know what string to search for.


Not to demean your work, but I also replicated what you did in Node-red.

And it also goes to twitter. And console. And MongoDB.

It took me 5 minutes.

http://imgur.com/lbHoTIb

(Below is JSON link to replicate what I did)

http://pastebin.com/i6KhuwbX


Fun.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: