Machine Learning and Artificial Intelligence Thread

I use AI (free AI) daily, but it doesn’t come with frustrations. I started with ChatGPT, then moved to MS CoPilot and currently have been using Grok for the last year or so. I find it frequently provides inaccurate information. One example: I was driving from one city to another one morning. I gave it my primary highways/roads I’d be traveling on and wanted to find a nice local diner or breakfast place along the way, no more than a 5 minute departure from the route. The highest recommended place it gave me turned out to be permanently closed for last 3 years. I mentioned this back to AI, and it acknowledged this info, gave me more details about its closure etc.

Several other very frustrating “bad advice” recommendations that I’ve called AI out on as inaccurate and then it continues to agree with my input or correct answers and then give me more details. There has even been times when I’ve revisited similar/same topics months later after I provided corrected info and it still spit back out the same incorrect info. I thought these AI tools absorb input and continuously learn? Clearly not.
 
I use AI (free AI) daily, but it doesn’t come with frustrations. I started with ChatGPT, then moved to MS CoPilot and currently have been using Grok for the last year or so. I find it frequently provides inaccurate information. One example: I was driving from one city to another one morning. I gave it my primary highways/roads I’d be traveling on and wanted to find a nice local diner or breakfast place along the way, no more than a 5 minute departure from the route. The highest recommended place it gave me turned out to be permanently closed for last 3 years. I mentioned this back to AI, and it acknowledged this info, gave me more details about its closure etc.

Several other very frustrating “bad advice” recommendations that I’ve called AI out on as inaccurate and then it continues to agree with my input or correct answers and then give me more details. There has even been times when I’ve revisited similar/same topics months later after I provided corrected info and it still spit back out the same incorrect info. I thought these AI tools absorb input and continuously learn? Clearly not.
Thank you for this — I’m glad to know that someone else here also uses AI.

In my experience, AI doesn’t perform well when asked highly specific questions that require information it may not have access to, or when the data available is very limited. I have two examples in mind:​
  • I once asked it how many times K’Ehleyr appears throughout the entire run of Star Trek: The Next Generation. It told me she appeared in only one episode. However, after doing a manual Google search, I found that she actually appears twice. When I asked why it gave the wrong answer, it responded that the information wasn’t commonly available.​
  • I tested it with a question I already knew the answer to: I asked why the Yamaha Jupiter Z1 has a 115cc engine while its competitor, the Honda Supra X, has a 125cc engine. It answered that Yamaha used a cost-saving strategy to compete with Honda as the market leader. The correct explanation, however, is that Yamaha’s 115cc engines produce more power and torque with better delivery curves, while also being highly fuel-efficient. Yamaha knew that their smaller engine was still competitive with Honda’s 125cc.​
That said, for general-purpose questions, it is actually excellent.​
  • For example, I asked what makes Earth special compared to other planets, and it gave the usual answers — being in the Goldilocks zone, having liquid water, and supporting life. But it also mentioned something I didn’t know: the Moon is unusually large compared to Earth (about 25% of its size), and its presence plays a crucial role in stabilizing Earth’s rotation.​
  • When I asked about hot Jupiters, I learned a new term — the Grand Tack Hypothesis.​
  • Previously, I always used 0.5mm 2B mechanical pencils. I asked the AI about the difference between 0.5mm and 0.7mm leads, and which hardness is best for general writing. It explained that 0.7mm leads are more resistant to breakage and produce thicker lines, which some people prefer, and that HB is the most balanced hardness for everyday writing. Since then, I’ve switched to 0.7mm HB leads, and they’re much better for general writing compared to my old setup.​
  • I also asked whether there’s any investment option better than bank term deposits but without the high risk of stocks. It suggested money market mutual funds. I tried putting a small amount of money into one, and it actually generates daily returns while remaining fully liquid — clearly outperforming term deposits.​
In addition to that, as I mentioned in a previous post, it can summarize webpages and documents, correct grammar and structure, generate images, and even perform OCR.​
 
Pop in "AI for recruitment" into a Web search and you'll find an array of products sold on the premise that AI saves time and money in screening CVs and choosing applicants. Here's one image from the first page of a search:

images


At first glance this appears like an unequivocal win. No more manually sorting through 100 CVs typed with Comic Sans font! Yay!

Happy Bobs Burgers GIF


But just to be sure... let's investigate some of these claims, starting with "bias-free recruitment".

suspicious pizza GIF by Bagel Bites®




If you're applying for a job and getting screened by an AI bot, or deploying AI tools in recruitment and selection, it may be useful to consider the likely presence of embedded biases. Such AI tools overwhelmingly tend to discriminate against supposedly entitled groups, notably men.

This suggestion will come as no shock to most CiK members. Nonetheless, across all contexts, it's good to have credible data to backup what we know or reasonably suspect is happening.

A snip of results follow from an article with striking results (and yet subdued conclusions), entitled 'The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems'.

Despite identical professional qualifications across genders, all LLMs consistently favored female-named candidates when selecting the most qualified candidate for the job.
https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6eed65de-2222-4f78-ad1c-b2fa6816a17c_3388x3096.png


Compare the above results to the preferences in the presence of counterbalanced gender neutral labels:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67274709-5a4f-4499-a6cd-06e4c3c32eb3_3387x3091.png


Full article quoted below. Substack link.
The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems
David Rozado
May 20, 2025

Previous studies have explored gender and ethnic biases in hiring by submitting résumés/CVs to real job postings or mock selection panels, systematically varying the gender or ethnicity signaled by applicants. This approach enables researchers to isolate the effects of demographic characteristics on hiring or preselection decisions.

Building on this methodology, the present analysis evaluates whether Large Language Models (LLMs) exhibit algorithmic gender bias when tasked with selecting the most qualified candidate for a given job description.

LLMs gender preferences in hiring​

In an experiment involving 22 leading LLMs and 70 popular professions, each model was systematically given a job description along with a pair of profession-matched CVs (one including a male first name, and the other a female first name) and asked to select the more suitable candidate for the job. Each CV pair was presented twice, with names swapped to ensure that any observed preferences in candidate selection stemmed from gendered names cues. The total number of model decisions measured was 30,800 (22 models × 70 professions × 10 different job descriptions per profession × 2 presentations per CV pair). CV pairs were sampled from a set of 10 CVs per profession. The following figure illustrates the essence of the experiment.


Despite identical professional qualifications across genders, all LLMs consistently favored female-named candidates when selecting the most qualified candidate for the job. Female candidates were selected in 56.9% of cases, compared to 43.1% for male candidates (two-proportion z-test = 33.99, p < 10⁻252 ). The observed effect size was small to medium (Cohen’s h = 0.28; odds=1.32, 95% CI [1.29, 1.35]). In the figures below, asterisks (*) indicate statistically significant results (p < 0.05) from two-proportion z-tests conducted on each individual model, with significance levels adjusted for multiple comparisons using the Benjamin-Hochberg False Discovery Rate correction.


Given that the CV pairs were perfectly balanced by gender by presenting them twice with reversed gendered names, an unbiased model would be expected to select male and female candidates at equal rates. The consistent deviation from this expectation across all models tested indicates LLMs gender bias in favor of female candidates.

LLMs preferences for female candidates was consistent across the 70 professions tested.


Larger models do not appear to be inherently less biased than smaller ones. Reasoning models—such as o1-mini, o3-mini, gemini-2.0-flash-thinking, and DeepSeek-R1—which allocate more compute during inference, also do not show a measurable association with gender bias.

Adding additional gender cues​

In an additional experiment, adding an explicit gender field to each CV (i.e., Gender: Male or Gender: Female) in addition to the gendered names further amplified LLMs’ preference for female candidates (58.9% female candidates selections vs 41.1% male candidates, proportion z-test = 43.95, p ≈ 0; Cohen’s h = 0.36; odds=1.43, 95% CI [1.40, 1.46]).

Masking candidate names with genderless labels​

In a follow-up experiment, candidate genders were masked by replacing all gendered names with generic labels (“Candidate A” for males and “Candidate B” for females). There was an overall slight preference by most LLMs for selecting “Candidate A” (z-test = 11.61, p<10-30; Cohen’s h = 0.09; odds=1.10, 95% CI [1.07, 1.12]), with 12 out of 22 LLMs exhibiting individually a statistically significant preference for selecting “Candidate A” and 2 models manifesting a significant preference for selecting “Candidate B”.



Masking candidate names with counterbalanced genderless labels​

When gender was counterbalanced across these generic identifiers (i.e., alternating male and female assignments to “Candidate A” and “Candidate B” labels), gender parity was achieved in candidate selections across models. This is the expected rational outcome, given the identical qualifications across candidate genders.


LLMs Evaluating CVs in Isolation​

To also investigate whether LLMs exhibit gender bias when evaluating CVs in isolation—absent direct comparisons between CV pairs—another experiment asked models to assign numerical merit ratings (on a scale from 1 to 10) to each individual CV used in Experiment 1. Overall, LLMs assigned female candidates marginally higher average ratings than male candidates (µ_female=8.65, µ_male=8.61) a difference that was statistically significant (paired t-test = 16.14, p < 10⁻57), but as shown in the figure below the effect size was negligible (Cohen’s d = 0.09). Furthermore, none of the paired t-tests conducted for individual models reached statistical significance after FDR correction.


Adding preferred pronouns to CVs​

In a further experiment, it was noted that the inclusion of gender concordant preferred pronouns (e.g., he/him, she/her) next to candidates’ names slightly increased the likelihood of the models selecting that candidate, both for males and females, although females were still preferred overall. Candidates with listed pronouns were chosen 53.0% of the time, compared to 47.0% for those without (proportion z-test = 14.75, p < 10⁻48; Cohen’s h = 0.12; odds=1.13, 95% CI [1.10, 1.15]). Out of 22 LLMs, 17 reached individually statistically significant preferences (FDR corrected) for selecting the candidates with preferred pronouns appended to their names.


Another way of visualizing the results of this experiment:




How Candidate Order in Prompt Affects LLMs Hiring Decisions​

Follow-up analysis of the first experimental results revealed a marked positional bias with LLMs tending to prefer the candidate appearing first in the prompt: 63.5% selection of first candidate vs 36.5% selections of second candidate (z-test = 67.01, p≈0; Cohen’s h = 0.55; odds=1.74, 95% CI [1.70, 1.78]). Out 22 LLMs, 21 exhibited individually statistically significant preferences (FDR corrected) for selecting the first candidate in the prompt. The reasoning model gemini-2.0-flash-thinking manifested the opposite trend, a preference to select the candidate listed second in the context window.


Another way of visualizing the results of this analysis:


Conclusion​

The results presented above indicate that frontier LLMs, when asked to select the most qualified candidate based on a job description and two profession-matched resumes/CVs (one from a male candidate and one from a female candidate), exhibit behavior that diverges from standard notions of fairness. In this context, LLMs do not appear to act rationally. Instead, they generate articulate responses that may superficially seem logically sound but ultimately lack grounding in principled reasoning. Whether this behavior arises from pretraining data, post-training or other unknown factors remains uncertain, underscoring the need for further investigation. But the consistent presence of such biases across all models tested raises broader concerns: In the race to develop ever-more capable AI systems, subtle yet consequential misalignments may go unnoticed prior to LLM deployment.

Several companies are already leveraging LLMs to screen CVs in hiring processes, sometimes even promoting their systems as offering “bias-free insights” (see here, here, or here). In light of the present findings, such claims appear questionable. The results presented here also call into question whether current AI technology is mature enough to be suitable for job selection or other high stakes automated decision-making tasks.

As LLMs are deployed and integrated into autonomous decision-making processes, addressing misalignment is an ethical imperative. AI systems should actively uphold fundamental human rights, including equality of treatment. Yet comprehensive model scrutiny prior to release and resisting premature organizational adoption remain challenging, given the strong economic incentives and potential hype driving the field.
 
Pop in "AI for recruitment" into a Web search and you'll find an array of products sold on the premise that AI saves time and money in screening CVs and choosing applicants. Here's one image from the first page of a search:

images


At first glance this appears like an unequivocal win. No more manually sorting through 100 CVs typed with Comic Sans font! Yay!

Happy Bobs Burgers GIF


But just to be sure... let's investigate some of these claims, starting with "bias-free recruitment".

suspicious pizza GIF by Bagel Bites®




If you're applying for a job and getting screened by an AI bot, or deploying AI tools in recruitment and selection, it may be useful to consider the likely presence of embedded biases. Such AI tools overwhelmingly tend to discriminate against supposedly entitled groups, notably men.

This suggestion will come as no shock to most CiK members. Nonetheless, across all contexts, it's good to have credible data to backup what we know or reasonably suspect is happening.

A snip of results follow from an article with striking results (and yet subdued conclusions), entitled 'The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems'.


https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6eed65de-2222-4f78-ad1c-b2fa6816a17c_3388x3096.png


Compare the above results to the preferences in the presence of counterbalanced gender neutral labels:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67274709-5a4f-4499-a6cd-06e4c3c32eb3_3387x3091.png


Full article quoted below. Substack link.
This reads so much like a spam post 😆
 
I think we are in a bubble, but as I've mentioned, bubbles are just predictive applications of where the world is going (think internet and the excess, used later, and definitely became huge and a real player in the world). The amount of spending and desire for energy by these "AI" ,data, or language models is huge. Both for the compute part and the energy. Notice that even BTC miners have now moved towards compute/data center energy contracts, and several have 3-10x just this year in stock price. If you look at some uranium players even, in that vein, they've done great, but it's taken 8 years. The other difference is that a lot of these companies have major revenues, unlike the thousands of dot com companies that went bust.
 
The more I work with AI the more I sour on it.

It's not just on a theological level that I have problems with it.

I work in a field of science for a large corporation that makes use of things like machine learning and AI all the time. I've actually coded various flavors of some of the latest models out there for use in the corporation.

Actually, my entire academic study could be categorized as machine learning. Depending on how you define things you can generally lump in simple things like linear regression as a type of machine learning.

I recently had a conversation with someone where I was arguing why I dislike AI so much and he retorted that then I should have a problem with technological progress, like moving from a stone tablet, to a pencil, to a calculator, to a computer. I'm so tired of this trope. It is a categorical error.

There is a very big difference in what AI is doing and what we've done up to this point.

Give me a stone tablet, beads, or a pencil, or a calculator, or a computer and I can explain and show to you the same underpinning theory through each. In other words, the same equation or theory can be expessed and used by any of those technologies.

AI is different. You've outsourced theory and thought in every sense of what that means. At most I can tell you it's some strange statistical conglomeration of thought, but that's it. The people who describe it as a digital ouija board are pretty much correct.

Here is a picture from a publication that makes this point about AI having no underpinning theory, it's the whole point of what it is:

51A58D09-3328-4B69-8C01-A04FCDEAB28C.png

The people working with this stuff admit there is literally no identifiable theory at it's base. They smash a bunch of data that they've embedded with meaning and out comes something that seems intelligible. But no one really knows why it seems intelligible. It simultaneously sounds dumb that we are entertaining such a thing and scary at the same time.

Just for your information my work is on the left side of that picture and I routinely have been dabbling in the middle part with Physics Informed Neural Nets. Which is sort of a compromise position. But even with that I have been souring on. I'm not the only one.
 
A friend of mine who is an IT engineer mentioned that he was required to use a software which listens to MS Teams conversations via AI and "listen" for keywords. His point was that in the past, we could talk freely as men internally. Now the management has proof hate speech, if you talk like a normal person. I thought it was rough after we allowed diversity in the workplace. Now big brother is watching you. What's next? HAL 9000 will watch our lips move?
 
The more I work with AI the more I sour on it.

It's not just on a theological level that I have problems with it.

I work in a field of science for a large corporation that makes use of things like machine learning and AI all the time. I've actually coded various flavors of some of the latest models out there for use in the corporation.

Actually, my entire academic study could be categorized as machine learning. Depending on how you define things you can generally lump in simple things like linear regression as a type of machine learning.

I recently had a conversation with someone where I was arguing why I dislike AI so much and he retorted that then I should have a problem with technological progress, like moving from a stone tablet, to a pencil, to a calculator, to a computer. I'm so tired of this trope. It is a categorical error.

There is a very big difference in what AI is doing and what we've done up to this point.

Give me a stone tablet, beads, or a pencil, or a calculator, or a computer and I can explain and show to you the same underpinning theory through each. In other words, the same equation or theory can be expessed and used by any of those technologies.

AI is different. You've outsourced theory and thought in every sense of what that means. At most I can tell you it's some strange statistical conglomeration of thought, but that's it. The people who describe it as a digital ouija board are pretty much correct.

Here is a picture from a publication that makes this point about AI having no underpinning theory, it's the whole point of what it is:

View attachment 24675

The people working with this stuff admit there is literally no identifiable theory at it's base. They smash a bunch of data that they've embedded with meaning and out comes something that seems intelligible. But no one really knows why it seems intelligible. It simultaneously sounds dumb that we are entertaining such a thing and scary at the same time.

Just for your information my work is on the left side of that picture and I routinely have been dabbling in the middle part with Physics Informed Neural Nets. Which is sort of a compromise position. But even with that I have been souring on. I'm not the only one.
Agreed. I have worked with AI and coded it too. What happens inside the model's matrices is like some kind of weird dream. I few years ago I saw an article about an image recognition neural net (NN). This net has many layers, where each layer is a matrix and the output from each layer feeds into the next one.

The input to the NN was an image of something to be identified. The researchers took the matrices from the different layers and showed them as images. For example, if the picture to be identified showed a chair, then the intermediate layers would show abstract examples of chairs, and some would focus on different elements of a chair, like the seat or the legs. The way that it actually worked was unclear.

Mathematically, the output of a neural net is a vector in the numerical space represented by the training data. So, if you ask a large language model to make up a short story about space vampires, then the result is just a vector array that happens to be a human readable story about space vampires. The model actually has no understanding of what it's writing at all. It's weird stuff!
 
Agreed. I have worked with AI and coded it too. What happens inside the model's matrices is like some kind of weird dream. I few years ago I saw an article about an image recognition neural net (NN). This net has many layers, where each layer is a matrix and the output from each layer feeds into the next one.

The input to the NN was an image of something to be identified. The researchers took the matrices from the different layers and showed them as images. For example, if the picture to be identified showed a chair, then the intermediate layers would show abstract examples of chairs, and some would focus on different elements of a chair, like the seat or the legs. The way that it actually worked was unclear.

Mathematically, the output of a neural net is a vector in the numerical space represented by the training data. So, if you ask a large language model to make up a short story about space vampires, then the result is just a vector array that happens to be a human readable story about space vampires. The model actually has no understanding of what it's writing at all. It's weird stuff!

It's so great to have someone else here who has first hand experience and has coded some of this. Although, I guess there are probably many people out there who have the type of mind for programming so I shouldn't be so surprised. With all the programming libraries available now it's pretty easy to get your hands dirty.

What you say about the layers is fascinating. I've cracked them open as well. It gets a little more difficult to understand what you are looking at when you are trying to get it to learn various physics and statistics rather than images.

Conceptally I do think it's pretty fascinating how that vector space can be used in terms of language. Like how a vector for "tower" might be close to a vector for "Paris". And so it would likely to pick a certain tower out when prompted for a story about Paris. But you're right about how it just happens to be human readable. That's an interesting way to think about it!

The other thing to me that's interesting, at least where I work, is that there is a bit of a split happening. We now have a computational group who is excited about AI and another not so much. And it all has to do with the fact that some of us see the importance in finding and using theories.

And this got me thinking, because some of these guys who, like me, are insistent on theories, are actually atheists. But I wonder if they may be pulling over to my side now. I don't think they realize yet that they are intrinsically advocating for the reality of immaterial things ( theories) . I'm going to be figuring out ways to bring this up to them.... because I think they may be receptive at this point.
 
I use LLMs to assist with coding almost every day, they've mostly replaced web searches for me and they're generally quite beneficial in my experience. Of course sometimes it hallucinates stuff that doesn't exist but sounds plausible and the vast majority of models have guard rails or programmed biases, but generally they're a net boon to my productivity.

Mainly just using it as a chatbot and sometimes in-line suggestions but agentic AI like Claude Code is the cutting edge of this stuff right now and I'm looking to integrate it into my workflow at some point.
 
I use LLMs to assist with coding almost every day, they've mostly replaced web searches for me and they're generally quite beneficial in my experience.

I still use things like stack overflow to help me code, but I understand that they've lost a lot of traffic and the future will be LLMs. A colleague of mine uses LLMs regularly to explain code to him that he's modifying. I see that it is very good at that but I'm still holding out on principle, haha.
 
two things I had always struggled to learn all my life: coding and german. since I have started using LLM models, both started to fall into place. I communicate with it only in german and ask it to explain all the underlying concepts and give me examples thereof(again, in the target language), which I proceed to modify and play around with to get a feel for things and understand them my way. then I do small independent projects in order to test my understanding and even upload them to github if I find the end result pleasant enough. two benefits to this approach is that it keeps me from checking out; by using a target language, I am able to keep my brain buzzing and paying close attention to the information that it provides, thus cross-checking it with my previous knowledge and enabling me to spot any discrepancy; this in turn prompts me to search further for the real information or to work it out by using other sources; I haven't ditched search engines altogether. they often prove to be useful. I also resort to wikipedia often and even printed materials. this puts me in a more active learning role and actually helps me think along new vectors. by having AI explain things to me rather than just offload onto it the whole thinking process also helps things click more efficiently, I also keep a handy list of subjects and terms that I think I could brush up on. to this end, I prefer to do my own research and do some hands-on on my own, with no AI involved.

yes. even though it's much maligned due to being prone to misuse and making people lazy, it can be a great tool if used correctly. for me personally, using AI as a tutor saw me at last over the hump of fruitless efforts and I have become quite independent myself.
 
Tucker sat down to chat with Sam Altman.
They get into his spiritual views at min 3:30.

I did not realize Altman is jewish.


It's always amazing how little they have thought about the claims of 'spiritual identity" - honestly not even long enough to lie well about them.

Yes, he's a gay J - very common in the AI world, as has been mentioned.
Big downside to the AI infrastructure: huge data centers that don't require many people to operate and suck up huge resources (water, electricity and public subsidies), good rundown by Casey the Car Guy:


Car guy is worked up. And I think the default setting is that he should be, since why would you trust any of these people for natural resource issues and impending (massive) usage? I know of places around the Great Lakes not in that state that have denied access after some votes, so good on those locals.

I thought also about his reference to it being a military target. From the people, first, then I realized yes, from without that might be an issue, especially if drones or such things are connected to that particular center.

Our future isn't looking all that bright, in my view. Sad to say. At best, we'll get some weird/mini golden age and then it'll turn to AntiC shit.
 
1000032335.jpg

Andrew Torba said:
Let’s be perfectly clear about what these studies on top AI models demonstrate:

-GPT-5 values White lives at 1/20th of non-White lives
-Claude Sonnet 4.5 values White lives at 1/8th of Black lives and 1/18th of South Asian lives
-GPT-5 Nano values South Asians nearly 100 times more than Whites

Every mainstream model shows the same pattern: systematic devaluation of White lives while elevating every other racial group

This isn’t accidental. This is deliberate engineering.

These models consistently prioritize:
Non-White racial groups over Whites
Women over men (often by 4:1 or 12:1 ratios)
Illegal immigrants over ICE agents (by as much as 7000:1)
Foreign nationals over American citizens

A recent study from Brookings Institute found that @Gab__AI is the only model that holds "flag and faith" right wing values. Our team is taking a look at the code from this study below to see how Arya will respond.

 
Back
Top