Building a RAG system, Part 1 Update - Data Pivot, YC Companies

A week ago I posted about a little RAG project I'm working on, and I've spent the last week poking around looking for data about snowshoeing in Tahoe. As I've done this, I had a bit of an aha moment, in many ways inspired by YC holding their first in-person demo day in a long time this week.

While I love snowshoeing, I'm not sure this is something that would actually get much use, and well - even if I'm just playing around, I like the idea of building something that's actually pretty useful. And, heck, I already know where I like to snowshoe, what I don't know, but find totally fascinating is which of the YC companies coming out of demo day will end up on the path to unicorn status.

Of course, there's no magic rubrics for this, but there are some very clear patterns, many related to fundraising and team building, a lot of which is available on X. 

So I'm pivoting my tutorial idea here and I'm going to build a RAG system that leverages previous YC funding data. The idea then is that you could share updates about a current YC company, and it could use past YC startup data to better analyze their trajectory. Here's an example of how I think the model could be trained to think.

Supposed there's a really positive correlation between YC companies that raise from a16z going on to become unicorns, then sharing with model that a particular company just raised from a16z would put them on that path. On the other hand, if a company decides to bootstrap, probably not. 

Of course, the power of RAG and LLMs here is the ability to take in a lot of data and use it all together, because as we all know, not every company that leaves YC and raises from a16z become a unicorn. 

As for how I'm going to collect this data, I'm still evaluating all the different data sources that are out there, this might take a few weeks so expect a bit of a gap from me on this as I'm investigating. The nice thing about doing stuff like this is, there's no rush, I'm just incredibly curious about RAG systems and am having fun exploring and sharing with the handful of you that read this blog what I learn as I go.

Okay, well it's Friday morning and I just finished my first cup of coffee, time for more coffee. TGIF everyone and thanks for reading!

Waking up to Bitcoin over $100,000

Well, today is a historic day, honestly it still feels pretty surreal. I bought my first Bitcoin in 2014, and for years people called me crazy, put me down, and told me Bitcoin was only for scammers. 

Clearly, ten years later, we now know that Bitcoin is something pretty special. Companies like Fidelity now have Bitcoin ETFs, and the CEO of Charles Schwab is now quoted as saying, "I have not bought crypto, and now I feel silly.”

Back in 2014 I registered the domain name BlogAboutBitcoin.com and I made myself a little promise - if Bitcoin ever crosses $100,000 - I'll put a blog on this domain name. I'm still in shock in many ways at what has happened over the last couple of months, but, I keep my promises, so I'm now writing a blog about Bitcoin.

There have been so many awesome articles and tweets already about Bitcoin's historic move. My personal fav is from Brian Armstrong, the founder of CoinBase:

What a time to be alive. 

While I won't be writing about Bitcoin much on here since pretty much every waking minute of my life is spent as the founder and CTO of Bold Metrics. I will be tracking Bitcoin's journey from $100,000 to $200,000 and beyond on BlogAboutBitcoin.com so if you want to follow the exciting road ahead for Bitcoin, you know where to go.

Congrats to everyone who knew this day would come. Onward, next stop $200,000.

Don't use a VPN to access CoinBase

This morning, Scott Shapiro, the Product Director at CoinBase sent out a tweet that definitely got everyone's attention. The tweet is word-for-word what I made the title of this post - don't use a VPN to access CoinBase. Here's the tweet:


The recommendation that Scott makes in this tweet is one that I think we're going to see more and more companies making over the next year - move to a physical security key. The reality is, far too many people still use SMS as their two-factor authentication method and this hasn't been safe for a long time. Heck, CNet put out an article back in June of 2021 about it.
Why is SMS so unsafe, here's the dets:

"Hackers have been able to trick carriers into porting a phone number to a new device in a move called a SIM swap. It could be as easy as knowing your phone number and the last four digits of your Social Security number, data that tends to get leaked from time to time from banks and large corporations. Once a hacker has redirected your phone number, they no longer need your physical phone in order to gain access to your 2FA codes." (Source - CNet)

Lately I've been seeing a lot of people posting on X with screenshots showing that they're locked out of their CoinBase account. In many cases I'm finding that these tweets are coming from digital nomads, who travel a ton, and thus, use a VPN to increase their security. The problem is, as Scott highlighted this morning - they're being flagged by CoinBase for doing exactly that.

My guess is that CoinBase won't be the only company making a statement like this. As more and more people travel and work remotely, VPNs are only becoming a pretty standard security measure...but, since almost all hackers and scammers also use VPNs it's incredibly challenging for companies trying to keep your data, and/or money secure.

It definitely feels like there's a nice big juicy problem for a startup to solve - maybe a new kind of VPN that requires some type of identity verification? Or maybe physical security keys really are the answer and VPNs will eventually go the way of the dodo. 

For those who do want to max out the security of their CoinBase account and prevent themselves from getting flagged as a potential scammer/hacker. Here's what your Account Security section should look like:
If this blog post helps even one person either not get hacked, or not get locked out of their account then it will have served a purpose. Thanks for reading and happy Tuesday!

Using RAG + ChatGPT with a React Frontend to help people find great places to snowshoe in Tahoe - A RAG Tutorial: Part 1

In case you missed the memo, RAG is all the rage in the AI world these days. If you don't know what RAG is, here's the TLDR;

RAG stands for Retrieval Augmented Generation, it all makes sense now right?

Okay so probably not. I'll go a step further and I think you'll see that RAG isn't actually all that complex, at a high level. RAG is the process of feeding specific data to an LLM so it is referencing what you consider to be highly relevant data (external data that you feed in) to allow for higher accuracy (and less hallucinations) in responses to prompts.

Over the last few months I've had more people ask me about RAG than ever before and I keep sending them to one article or another. I finally decided it was time for me to write my own article, and for me, I find I learn the best by doing, so I thought I'd put together a little tutorial and show RAG in action for something that I myself would find useful.

So here we go.

This is definitely going to be a multi-part series so to kick things off I'm just going to share the high-level goal for my RAG project. I want to use RAG to make it easier to find snowshoeing trails where I live, Lake Tahoe. Right now when you use an LLM like ChatGPT to find snowshoeing trails it gives pretty generic results, and it's tricky to drill down into specifics like difficultly level.

Lake Tahoe likely has thousands of snowshoe trails, but it's safe to say LLMs like ChatGPT don't know about most of them. I want to change this, I want to leverage the awesome horsepower of an LLM, coupled with lots of specific data about snowshoe trails in Tahoe, so that's what I'm going to build.

So what do I need to get started? Well the high-level components of a RAG system are:

  1. All the additional data you want to feed the model
  2. User inputs
  3. A similarity measure between the data and the user inputs

As for what I'll be using to code this. While I'm the most comfortable in Python, I've been trying to expand my JavaScript chops lately so I'll be using React and Node to make this happen.

And that's enough for now. I want to keep each post in this tutorial series short and sweet, and I know for people that don't know much about RAG, this was already probably quite a bit to take in. 

If you want to do a deeper dive into RAG before my next post, I would highly recommend reading this article from NVidia, aptly titled - What Is Retrieval-Augmented Generation, aka RAG?

Now, it's time for me to go snowshoeing in Tahoe!

Why MCP (Model Context Protocol) from Anthropic is a big deal, even though the name should really have a bit more sizzle

Okay, I'll be honest - I think AI companies suck at naming in general. When OpenAI came out with their newest, breakthrough models that introduced advanced reasoning, they called it - drumroll please  o1. When they added voice, a completely breakthrough technology foundationally changing how we work with LLMs, the amazingly creative name of "advanced" voice mode was given to it.

Anthropic, IMHO has actually done the best job naming their models, there's a great interview that Lex Fridman does with Dario Amodei, the founder of Anthropic that I highly recommend. In it, Dario shares that naming the three primary models - Haiku, Sonnet, Opus directly correlates to the complexity of the models with Haiku being the most simple and Opus, not surprisingly being the most complex. Pretty neat right?

Yesterday I'd say Anthropic broke their naming streak with the announcement of Model Context Protocol aka MCP. But all is forgiven in my book because MCP is actually a massive step forward in how we'll use LLMs so who the heck cares about what it's called. Still, this name makes MCP sound a lot more technical than it needs to be since it's something I think will be used far and wide. If I were to name it I'd probably call it something like "Agent Intelligence" - that has a nice ring to it doesn't it?

So what is MCP and why do I think it's such a big deal? I'll break this down in a digestible way because I think it's something everyone, whether you currently use LLMs regularly or not, should know. 

  • First off, MCP didn't come into existence yesterday, it was already live, yesterday Anthropic open-sourced it so anyone can use it and run an MCP Server themselves
  • Okay - but what is MCP? At a very high-level, MCP is a a way to connect AI assistants (think agents) to a system where data is stored. The easiest example here, or at least my favorite, is, connecting an AI assistant to a Github repo so that an AI agent can use this data
  • Three major components were released yesterday: 
    • The Model Context Protocol specification and SDKs
    • Local MCP support in the Claude Desktop Apps
    • An open source repo of MCP Servers

Of course an example is always the best way to showcase how something is being used right? So here's a good one that Anthropic shared alongside the announcement yesterday:

"Early adopters like Block and Apollo have integrated MCP into their systems, while development tools companies including Zed, Replit, Codeium, and Sourcegraph are working with MCP to enhance their platforms—enabling AI agents to better retrieve relevant information to further understand the context around a coding task and produce more nuanced and functional code with fewer attempts."

Okay and that's the TLDR; for you on Anthropic's Model Context Protocol announcement yesterday. If you want to do a deep dive yourself, here's the first two places I would go:

  1. Anthropic's blog post about MCP
  2. MCP Docs

Sunday thoughts - the new tech stack: Cusor and Windsurf, v0 and Bolt, Supa and Neon

Hello happy Sunday, and welcome to what's running through my brain on a Sunday morning. As you can tell by the title, today I'm thinking a lot about the new tech stack that has developed over the course of the year that now seems to be fully cemented in many ways. Seriously, this has happened fast and I can already see a lot of people getting left behind.

The reality is, the software development world is moving faster than ever before, thanks to AI. With this change comes a bifurcation - those who jump in and use the new stack, and those who stick to the way they've been doing things for years. For devs who are jamming with new tools the impact is clear - much faster development times and the ability to iterate and launch new ideas at a speed that a traditional dev just won't be able to compete with.

A lot of people have been posting what they're using to build on Twitter/X and I've noticed a pretty clear pattern. I pretty much gave it away in the title of this post but I'll share one of the tweets below so you can see a good example of someone leveraging this new stack.

(link to tweet)

--

In many ways this is like the old Tired and Wired section of Wired Magazine except it all happened, fast. As I see it today, here's the new tech stack that you can either start using right away, or wait on and get further behind (sorry to be harsh but it's true!)

Dev IDE: Cursor or Windsurf

Fontend Dev: v0 or Bolt

Graphic design: Midjourney

Database: Supa or Neon

LLM: Claude 3.5 Sonnet (sorry but OpenAI isn't up to snuff on when it comes to coding...yet)

Now could this all change next month, yes - absolutely, the dev space is changing incredibly quickly thanks to AI, and the opportunities for new companies to jump in and take a chunk of the market is there. For now, these are the tools that me and many others are using, if you aren't using any of them yet, it's time to ask yourself why...

Hello World. Again.

Hello and welcome to my first blog post in a long time, and officially my first post on the new MorganLinton.com. While it’s more than likely that my future posts will be long(ish), I’m going to try to keep this first post short and sweet since it’s really an intro to me and my blog.

If you’re here, maybe you know who I am, maybe you don’t, so let’s start there.

Hi, I’m Morgan and I’m the Cofounder and CTO of Bold Metrics. At Bold Metrics we’re using AI to help companies unlock the power of body data. What the heck does that mean?

It means we have some pretty nifty algorithms that use AI to generate tailor accurate body measurements - I oftentimes say, think of us as the “Intel Inside of Body Data.”

We work with a lot of brands you know and love powering their online fit and sizing experience. You can check out our website if you want to see who uses us and give us a try for yourself.

As a founder, pretty much every waking moment is spent on Bold Metrics. I used to think there would be this magical balance that could exist, but honestly, I’ve found I’m so passionate about what we are doing, and more than that, about the incredible team we’ve built, it’s what I end up putting just about every waking minute into.

One thing I have learned over the years is that I still have a lot to learn. I get offers to be on podcasts and speak at conferences and I turn them all down because honestly, this is my first startup, I’m still learning, and I think those opportunities are better left to people who have started multiple companies, sold companies, all the things I haven’t done yet.

What has accelerated my learning over the last year is bringing on a mentor, someone who has done some pretty impressive stuff, namely cofounding a company and growing their engineering team to over 200 people. I’ve learned a lot from him, but through this process I also find myself realizing how much more I still have to learn.

At the end of the day, what matters to me the most is my team, the people who work with us and dedicate so much of their time to the vision we are all building together. As a founder, it really isn’t about you, isn’t about your investors, isn’t even about your product, it’s about your team - without an amazing team, nothing else matters.

So what did I do before this crazy founder life? I was early at Sonos, starting pre-launch when we had less than forty people. That was an incredibly journey and it’s very likely I’ll share more about it on here as time goes on. I learned a lot about Product at Sonos - the way the company approached Product was, and still is, a lot like an art form, and the level of detail and quality they put into everything they do is the reason why Sonos has become the brand it is today.

Of course I’m not aloof to the fact that the UX for their new app isn’t great, don’t know what happened there but I’m sure they’ll fix it and make it absolutely stellar.

Sonos is the only job I've had since graduating college. I went to school at Carnegie Mellon, and did my undergraduate degree in Computer Engineering and Computer Science, and my masters degree in Computer Engineering. In college I was always fascinated with writing super fast, performant code, that could run even faster if you knew the specifics of the processor it would be running on. We used to call this processor-level optimization but I think it might be a somewhat lost art these days.

Going even further back...I was born and raised in Berkeley, California and somehow ended up working for my first software company when I was thirteen years old, but I’ll save that story for another day. For now, I’ll end it there. Thanks for reading my first post, where this blog goes from here, who knows?