12 Comments

Your substack did NOT link to the original authors which is NOT acceptable!

Let's give them credit!

GitHub: https://github.com/Jiayi-Pan/TinyZero

Source on X: https://x.com/jiayi_pirate/status/1882839370505621655

Expand full comment

I'm an investor, trying to understand what effect this might have on business. Will research like this lead to development at low cost of specialised AI-based applications to do specific tasks like monitor processes, answer questions from members of the team and the public, spot accounting anomalies, etc? Or is that already happening and I just don't know about it?

Expand full comment

Although the DeepSeek R1 replication this article introduces is more toy-like or skill showoff, but it can operate on some tasks that you would like to ask CharGPT to do. For example answering question. It's just slower and less accurate as it costs only $30 and it serves locally.

And you're right, lower cost and localization is where we're going.

Actually I'd like to write an article about how DeepSeek R1 and this kind of low cost AI models impact the tech giants like Nvidia or Meta, and their stock prices.

Expand full comment

Feels like Apple got the direction right.

Expand full comment

That would be interesting. It might also have an impact on the financial returns of the proposed new massive data centres complete with new power stations including nukes. It's obvious that AI is going to change lots of things, maybe everything, but I fear we are about to see an extraordinary misallocation of capital.

Expand full comment

Already happening. If you write knowledge articles in Salesforce there is already ML models reading incoming emails and giving you the best article/template to reply.

Also automatic text generation at an extra cost, but the ML is getting more and more default in everything as specialized models.

Creating more values to the public. Need fewer people because they can work faster by removing the mundane and “boring” jobs.

Expand full comment
Feb 2Edited

it can do anything even managing the infrastructures

Expand full comment

it means you need to hire a human specialist only once to make a specialised custom dsai , domain specific ai with a unique dsl. What are now called tools are hardcoded conventional programmed functions that a larger llm has knowledge of how to use them, these tools will be replaced, quickly now, by actual small domain specific intelligent models instead of rigid and severely limited hardcoded tools, these hard coded tools can interact with large llm also but it will not bring you anything more because it is the same llm always. The large llm's are good in picking tools based upon general knowlegde, the small ones are best in doing the actual work for the task. I dont believe in large uber models that do it all based upon a single prompt. When humans work like that disaster is guaranteed. It is not so hard to look into the future, see the ai as human functionally in the organisation and wonder what it means if you have an employee with that expertise and discipline. Also task must be a small as possible to have a meaningful sub result. That means that normally a human has one job with eg 20 tasks. The human worker can be replaced (or enhanced) by an agent with 20 of these specialised ai. The resource department has to think different about hiring expertise in more fine grained manner, not one man on the job anymore instead a constellation of specialised models. No more one man jobs, only team jobs. All human employees will get a manager role, and if not they have to go. That is how i see it, dont take my word for it but i think it is inevitable, what else?

Expand full comment

You are describing a perfect world, where AI has access unilaterally to data, systems and tools.

In a smaller company with little expertise in managing their data layers and governance, sure agents and software systems could be integrated unilaterally easily.

At a large company with data Silo's, strict security practices, and fearful humans at the helm... well let's see how it shakes out.

Any software or system is only as good as the data it can access, and the AI Systems built to interact with the data. Today the conversation is mapping to the legacy 10+15 systems.

Also, the data types and vision is disaggregated enough that there's a lot of room left to grow and plan what roles humans and technology can do together.

Expand full comment

the fish turns

Expand full comment

I thought this work showed that you could replicate Deepseek Zero behavior on a very specific domain.

To me this is going to open the door to everyone getting mid to high range GPUs for personalized models that train and run on their systems with their own data. These models being highly customized to the domains that the user of the personalized chatbots need.

So it's going to widen the market for GPUs and personalized chatbots that are private and specific to each individual user.

Expand full comment

reinforcement learning takes more GPUs or time to train than supervised learning. things that sound too good to be true usually are. do some googling or just ask DeepSeek "do reinforcement models require more GPUs to train than supervised learning models?" A/B test it with the same question to ChatGPT. or just ask someone who knows what they're talking about. Deepseek likely has access to more illegal GPUs than they're letting on. Labs in SiV are trying to replicate this now by reverse engineering the dataset DeepSeek was trained on. Once they do that, you'll see a paper published debunking DeepSeek's claim that they used only a fraction of GPUs required to train comparable supervised learning models, such as ChatGPT.

Expand full comment