Hacker News new | past | comments | ask | show | jobs | submit | eweitz's comments login

User interfaces for biology have drastically improved over the last 10 years.

Domain-specific tools like genome browsers, protein viewers, or phylogenetic explorers [1-3] almost all look and feel a lot better than they did in 2012.

The biggest exception here is UCSC Genome Browser, which has an old-school design and web technology stack. That said, it's steadily added features over the years, has substantially sleekened UX in its periphery, and remains widely used.

There are also bespoke visual design resources for biology applications that are good and getting better, like BioRender and PhyloPic [4-5]. There are multi-tiered packages like Dash Bio that wrap biology components together [6]. There's a Blender biology community, too!

---

1. Genome browsers and components: https://jbrowse.org/jb2/, https://www.ncbi.nlm.nih.gov/genome/gdv, https://igv.org/app, https://eweitz.github.io/ideogram

2. Protein viewers: https://pymol.org/, https://nglviewer.org/ngl/

3. Phylogenetic explorers: https://clades.nextstrain.org/

4. https://biorender.com/

5. http://phylopic.org/

6. https://github.com/plotly/dash-bio, https://dash.gallery/Portal/?search=[Pharma]


https://eweitz.github.io/ideogram/related-genes - gene search recommendation engine paired with a web component for genome visualization


I'm more interested in read speed than write speed. I have about 2 MB of data that I fetch, parse and transform into a nested object for easy look-up by various types of keys. It consists of 6 other objects, and I'd guess it's < 50 MB in total size.

In my brief experiment, it was 12% faster to read from the web Cache API [1], re-parse and re-transform that nested object than to read the fully transformed object using IndexedDB via idb-keyval [2]. That surprised me! I went on to learn that IndexedDB does a structured clone as part of such reads, which I suspect is the main cause of slowness in my use case.

Related commits to reproduce that finding are in [3], specifically [4].

[1] https://developer.mozilla.org/en-US/docs/Web/API/Cache

[2] https://github.com/jakearchibald/idb-keyval

[3] https://github.com/eweitz/ideogram/pull/285

[4] https://github.com/eweitz/ideogram/pull/285/commits/90e374a0...


Some notes towards those ends:

WikiPathways supports advanced queries via their SPARQL API and UI. See [1] and [2]. I find WikiPathways nice because it lets logged-in users create and edit pathways, with a low barrier to entry.

I've been building a way to find related genes using biochemical pathways [3]. The source code linked there includes practical examples for fetching information on genes in those pathways, which you rightly note is needed for something compelling. That and other code there might help spark ideas for you on how to glue together various biochemistry and molecular biology APIs to achieve your vision.

I'm currently working on a way to drastically expand the set of organisms and pathways covered by WikiPathways. Yeast has 66 pathways there, compared to 1319 for human. By doing fast ortholog detection at runtime (using another SPARQL API, provided by OrthoDB [4]) I'm hoping to be able to convert relevant annotated pathways across organisms, e.g. human to yeast, mouse to rat, Arabidopsis to rice -- and vice versa.

[1] http://sparql.wikipathways.org

[2] https://www.wikipathways.org/index.php/Help:WikiPathways_Spa...

[3] https://eweitz.github.io/ideogram/related-genes?q=RAD51&org=...

[4] https://sparql.orthodb.org


I created Ideogram.js, a JavaScript library for chromosome visualization [1].

Ideogram supports genomic views to research and report findings on cancer, clinical variants, gene expression, evolution, agriculture, and more [2]. What previously existed for genome visualization was either focused on short genomic regions (e.g. genes) or complex to set up and maintain.

[1]: https://github.com/eweitz/ideogram

[2]: https://eweitz.github.io/ideogram


Large genetic datasets yield medical progress by increasing statistical power of tests. These better tests enable earlier and more targeted treatment.

Take heart disease. It has a significant but complex genetic component. Many genetic variants each contribute a small amount to risk for heart disease. If a given person has many small risk variants, the sum total risk -- often called "polygenic score" -- can be relatively high.

People in the top 8% of polygenic scores had a 3x higher risk for heart disease than the general population [1][2]. Through techniques like polygenic scoring, large genetic datasets enable uniquely early detection of high risk for the world's leading cause of death.

[1] https://www.nature.com/articles/s41588-018-0183-z

[2] https://www.vox.com/science-and-health/2018/8/24/17759772/ge...


Ah, yes, polygenic scoring and the heart disease increase rate. Did you happen to catch this[0] refutation of the single source's work?

[0] - https://twitter.com/cecilejanssens/status/103135930540723404...

Also, you're posing prediction and targeted treatment but you haven't posited how Bob's mapped genome sitting in 23andme will be used for medical treatment.

As we know, genes are not an emphatic, "this will happen to you," but an increase in likelihood; which still doesn't translate to any emphatic treatments from the genes, themselves, yeah?


> refutation of the single source's work

Janssens seems less skeptical of 23andMe's paper on polygenic score for type 2 diabetes [1][2], which -- interestingly -- positively cites the Khera 2018 paper on polygenic score for heart disease that she critiqued. Some researchers are skeptical, but the medical community generally seems to consider polygenic scores promising for tests [3][4].

> you haven't posited how Bob's mapped genome sitting in 23andme will be used for medical treatment.

Early intervention. Polygenic scores could be used for medical treatment by motivating earlier intervention. That could include stronger recommendations for better diet and exercise, closer monitoring programs, or more precise prescriptions. That, in turn, could reduce disease burden.

[1] https://twitter.com/cecilejanssens/status/113707970323438797...

[2] https://permalinks.23andme.com/pdf/23_19-Type2Diabetes_March...

[3] https://twitter.com/EricTopol/status/1129780543434964993

[4] https://journals.plos.org/plosmedicine/article?id=10.1371/jo...


>...could be used for medical treatment by motivating earlier intervention...

and

>...could include stronger recommendations for better diet and exercise, closer monitoring programs, or more precise prescriptions...

and

>...could reduce disease burden...

This is where the problem delineates for me: We're being massively assumptive in moving from "could" to "is" and "will".

I will, generally, concede the could portion but to assert that it is emphatically happening or going to happen is still far from fruition and to label this science as such, just yet, is overreaching and giving false hope where none should really be given because, then, you'll taint it's benefits with the drawbacks.

Remember: Anonymised data (e.g.: 23andme) only allows a survey of what's relatively known or can be inferred from the anonymised dataset.

To arrive at what you're suggesting, it would have to move into a different realm (I believe), like UK BioBank or GEDMatch but, even then, we're still basing things on speculative science - gambles of percentages that aren't, emphatically, true or false but a kind of "maybe, kind of, sort of, in a way, definitely could or defintely could not" muddied waters.

That, to me, is a far stretch from saying that the data in 23andme is - actually - helping medicine; which I believe is what the OC I replied to emphatically said.


UK Biobank has a smaller cohort but deeper and more reliable data.

All UK Biobank participants were tested for blood pressure, bone mineral density, grip strength, BMI, etc. The last 200,000 participants underwent detailed tests of cognitive function [1].

23andMe asks customers for health information, but self-reports are not usually as reliable as the clinician-administered tests done in UK Biobank.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6451771/


23andMe is CLIA certified and FDA authorized, and makes many medical claims. From [1]:

"CLIA certification and CAP accreditation 23andMe laboratory testing is done in U.S. laboratories certified to meet CLIA (Clinical Laboratory Improvement Amendments of 1988) standards, including qualifications for individuals performing testing and other standards to ensure the accuracy and reliability of results. The laboratory is also accredited by the College of American Pathologists (CAP), which has served as a model for various federal, state, and private laboratory accreditation programs throughout the world."

FDA authorizations for 23andMe's personal genome service are available online, e.g. [2] for Alzheimer's disease risk reporting based on the E4 variant of the APOE gene.

The company also offers ancestry reports, which are not clinical and thus covered by CLIA. But medical claims in 23andMe's health reports do comply with CLIA and other regulations.

1. https://medical.23andme.com/dna-kits/#clia

2. https://www.accessdata.fda.gov/cdrh_docs/pdf16/DEN160026.pdf


Hmm, news to me. In any case the point still stands that you can do SNP sequencing without being CLIA certified.


> Manhattan has twice the people with half the land area, on an island.

Manhattan has twice the residents based on U.S. Census population data, but actual daytime population is more than just residents. A more important statistic is commuter-adjusted population [1], i.e. number of people in an area during normal business hours, including workers. That's where Manhattan and San Francisco really diverge.

The commuter-adjusted population of Manhattan is 3.1 M [2], compared to 1.6 M residents [3]. San Francisco has a commuter-adjusted population of 1.0 M [4, 5], compared to 0.8 M residents [5]. In other words, Manhattan's population booms almost 200% during normal business hours, while San Francisco's increases a modest 25%. Manhattan may have 2x more residents, but it has 3x more daytime population than San Francisco.

Given it has about half the land area, Manhattan's daytime population density is thus 6x that of San Francisco.

Attitudes on development -- namely transportation infrastructure and building construction -- certainly contribute to that difference.

1. http://www.census.gov/hhes/commuting/data/daytimepop.html

2. http://www.citylab.com/commute/2013/05/most-important-popula...

3. http://quickfacts.census.gov/qfd/states/36/36061.html

4. http://ww2.kqed.org/lowdown/2014/01/10/how-city-populations-...

5. http://factfinder.census.gov/faces/tableservices/jsf/pages/p...


Everything eweitz mentions is spot on. I'll just note what my sister said not two moments ago when reading some of these comments. "San Francisco is a baby town, a joke, compared to what's going on in Manhattan." And I kinda have to agree. San Francisco's current issues are entirely of their own making. I have no sympathy. I grew up in Queens, lived in Manhattan for 16 years and moved to Dallas a year ago (cause someone paid me.) I've been to SF more times than I can count over the last 10 years for computer related work and conference, etc. You couldn't pay me to live there.

No other place in the country has the kind of network effect that NYC does. It is entirely due to density and density is a direct result of NYC's subway system. As much as New Yorkers love to complain about the subway, and believe me it's a sport out here, we would not be able to accommodate the masses we accommodate without it. Transportation and, of course, vertical space is the way we do it. I really don't know why this is such a mystery to so many not from here. It's kind of glaringly obvious to the most casual of observers.

San Franciscans need to stop navel gazing and look up to the sky. Go up, my friends, go up.


"I grew up in Queens, lived in Manhattan for 16 years and moved to Dallas a year ago (cause someone paid me.) I've been to SF more times than I can count over the last 10 years for computer related work and conference, etc. You couldn't pay me to live there."

Wait, so I could pay you to live in Dallas, but I couldn't pay you to live in SF ?


California's rules, regulations, tone and general outlook on most things irk me, and even more so SF as the epicenter of such things in California. So, no, I don't think one could pay me to live in San Francisco.

It doesn't need to be said here but I'll go ahead and say it anyway - Texas has no state income tax. Sure, they have no mass transit to speak of, but then again, in truth, they don't need it. (Outside of the philosophical view that all metropoli need mass transit, which I agree with.)

Edit: the fact that SF, a city with arguably a third the population of NYC has rents more than NYC is exactly the kind of "thing" that irks me about SF. It's the kind of social dysfunction that's intolerable - for me.

Another is the Bay Bridge. When standing in Marin north of the Golden Gate, there are a number of plaques describing the building of the GGB. A picturesque view with the BB in the background. Guess what? It turns out the GGB was built ahead of schedule and under budget - with no computers! I don't even need to go into the failings of the BB. You get my point.


Ahh I see, you hate regulation, taxes, and don't think a city the size of Dallas really needs mass transit in practice. So this isn't really about San Francisco at all. The only thing that really surprises me now is that you speak highly of New York.

I could describe how much more "social dysfunction" there is in Texas than California at length but I sense its going to be an uphill battle. Suffice it to say that my friends just got back from Austin where they witnessed a "mock shooting" at the university in support of the new open carry gun laws. When your legislators spend time expanding gun rights after repeated mass shootings, you're looking at social dysfunction. These same legislators also found the time to carefully craft anti-abortion legislation recently in an attempt to force all abortion clinics to close until the Supreme Court can overturn the decision. Meanwhile they don't seem to mind all the fake clinics being set up which use tax money and tax benefits to confuse and lie to desperate women who are under extreme stress already. You can have your 0% income tax rate "paradise" and try to remember that Texas has among the highest property tax rates in the country to make up for it.


> You can have your 0% income tax rate "paradise" and try to remember that Texas has among the highest property tax rates in the country to make up for it.

Coming from NY, those property taxes don't really make me even flinch. When you combine lower cost of living, lower real estate and no income tax higher property tax doesn't even move the needle.


Yeah Dallas is really the epitome of good urban planning and land use compared to San Francisco. Lets go get in our cars and drive thirty minutes to Chili's so we can have a real cultural experience </sarcasm>

Texas pretty much defines sprawl in my mind. San Francisco could be more dense but it certainly isn't sprawling.


You'd have to pay a lot more.


"San Francisco is a baby town, a joke, compared to what's going on in Manhattan."

I don't particularly have a soft spot for either city, but I agree with you on the note of maturity. Manhattan has a far better idea of what it's doing compared to SF.


I agree about the Manhattan to SF comparison.

But if you take a less cynical view, doesn't this mean that SF represents a remarkable opportunity?

At some point the population has to get it together, elect better reps, reject the nimbyism and build more towers and trains.

It's actually happening as we speak. New bay bridge, extensions of muni, transbay tower, Salesforce tower, ...

I'm biased living in SF, just getting married, and avoiding the current market rate rents. but I see tremendous opportunities in SF.

Which is crazy to think given all the advantages - nature, water, work, weather - SF has over most other places.


Agreed. That reminds of this old quote (replace france with SF):

>God just finished creating the world. He looks his creation and see France.

>"This country is to beautiful. It's unfair for the others countries."

>And so, God created the French.

On a side note, congratulations on getting married. Hope all goes well.


This is the first time that I read Hacker News that I realized who the poster was. Hopefully Dallas is treating you well!


Ha! This is the first time I've been recognized by someone reading something I wrote! Dallas is actually pretty ok. Shhh... don't let my NY friends know that I said that ;)


Exactly. SF's problem is not just a lack of sensible zoning but also a lack of adequate infrastructure to handle a modern sensible environment for commerce. Manhattan not just doubles its population size each business day but the vast majority of those people use public transit to get there. SF is still far too linked to the car. In many ways NYC is a 21st century center for commerce while SF is largely stuck with a 20th century way of thinking about things.


Several major classics are oddly omitted from that list:

- "The Art of Computer Programming" (TAOCP)

- "The C Programming Language" (K&R)

- "Structure and Interpretation of Computer Programs" (SICP)

- "Artificial Intelligence: A Modern Approach" (AIMA)


I agree and good ones. Bill Gates reportedly said he'd hire anyone who really got TAOCP volumes through and through. K&R is self-evident given dominant language. SICP is given it and LISP's contributions esp in academia. AIMA is only one debatable for field overall although I really enjoyed it and it's indisputably the landmark AI text. :)


I've heard that Bill Gates comment before but I recently bought the books and looked at random pages. I concluded that the kind of person you become after having gone through TAOCP, especially doing exercises and not just reading, you'd be way overqualified, to the point of being useless, for Microsoft or any other 'business-oriented' software company. Microsoft research might hire you, but, they have to look at your publications and you might need to have a PhD degree, so that's also a toss up.

Therefore, if you decide to go through TAOCP cover to cover, know that it's either for your own interest, and that you're willing to become academic/research minded instead of s/w-development minded. After that you might wanna join a PhD program (if you don't have it already) and do research or something.


The person that read TAOCP would have strong understanding, both theory and practice, of implementing algorithms for all sorts of stuff. They'd also be on a team that likely had people with more hands-on or business skill. The combination would make for more effective solutions. As presty pointed out, Microsoft does a lot of work on applications that require solid algorithms. They also have a research division that does cutting edge stuff. See VerveOS and SLAM driver verification below for examples of what they did with various tech they developed.

http://research.microsoft.com/pubs/122884/pldi117-yang.pdf

http://research.microsoft.com/en-us/projects/slam/

Microsoft Research in general http://research.microsoft.com/en-us/


are you really saying that from going through TAOCP someone would be overqualified and useless for working at a company that builds operating systems, dbms, programming languages, virtual machines and compilers?


In a way, yes. (I assume 'going through the book' includes doing exercises, not just reading).

Think of it this way. Is a PhD in combinatorics from the math department, with exposure to programming, the most suitable candidate for the job of building OS, DBMS, PL, VM, Compilers? I don't think so. If (s)he is interested in such a role, (s)he can definitely do a good job. But (s)he would have to be aware that roughly 90% of what (s)he learned and enjoyed while doing combinatorics research would not be relevant to the job. The hiring would depend on a combination of how much of that lifestyle (s)he is willing to give up for this job, as opposed to trying to find a tenure track faculty position where (s)he could continue pursuing research, and how much the software company thinks about the enthusiasm of the candidate (to switch from research to software development).

Going through TAOCP and doing exercises and learning relevant math is a close approximation to that, IMO. Keep in mind that a significant number of exercises in TAOCP are about proving theorems.


If Bill (a) has the position(s) and (b) says TAOCP master is best candidate, then yes the person is "the most suitable candidate" for whatever job he has in mind. The End.

Besides, I can teach anyone software development. Kids do it with Scratch, average people did it with BASIC in school, and lay business people used COBOL, Excel, and Visual Basic. I'm sure someone who can learn everything from algorithm optimization to assembler coding can handle C++ with some on-the-job learning. In all likelihood, they already were programming in various languages if they tried to get such a job.

Nonetheless, Bill says they're a good hire for stuff at Microsoft. That's where it went from "I wonder" to "Yes they are." No need to speculate.


Certainly there are non-"business" type software divisions in Microsoft that can use such people. Compilers, maybe?


That's plausible since TAoCP is a book about compilers...we just have to wait for chapter 12.


It's actually a book about algorithms, especially general ones. It has chapters on compilers or related areas (eg parsing). See here:

https://en.wikipedia.org/wiki/The_Art_of_Computer_Programmin...


Structured Programming, Michael Jackson.


The reason is also in the link

> This list of classic books is the result of a poll ACM conducted where members named their favorite computer science books.


But "Macintosh human interface guidelines" included !


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: