Machine Learning and Artificial Intelligence Thread

Pop in "AI for recruitment" into a Web search and you'll find an array of products sold on the premise that AI saves time and money in screening CVs and choosing applicants. Here's one image from the first page of a search:

images


At first glance this appears like an unequivocal win. No more manually sorting through 100 CVs typed with Comic Sans font! Yay!

Happy Bobs Burgers GIF


But just to be sure... let's investigate some of these claims, starting with "bias-free recruitment".

suspicious pizza GIF by Bagel Bites®




If you're applying for a job and getting screened by an AI bot, or deploying AI tools in recruitment and selection, it may be useful to consider the likely presence of embedded biases. Such AI tools overwhelmingly tend to discriminate against supposedly entitled groups, notably men.

This suggestion will come as no shock to most CiK members. Nonetheless, across all contexts, it's good to have credible data to backup what we know or reasonably suspect is happening.

A snip of results follow from an article with striking results (and yet subdued conclusions), entitled 'The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems'.

Despite identical professional qualifications across genders, all LLMs consistently favored female-named candidates when selecting the most qualified candidate for the job.
https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6eed65de-2222-4f78-ad1c-b2fa6816a17c_3388x3096.png


Compare the above results to the preferences in the presence of counterbalanced gender neutral labels:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67274709-5a4f-4499-a6cd-06e4c3c32eb3_3387x3091.png


Full article quoted below. Substack link.
The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems
David Rozado
May 20, 2025

Previous studies have explored gender and ethnic biases in hiring by submitting résumés/CVs to real job postings or mock selection panels, systematically varying the gender or ethnicity signaled by applicants. This approach enables researchers to isolate the effects of demographic characteristics on hiring or preselection decisions.

Building on this methodology, the present analysis evaluates whether Large Language Models (LLMs) exhibit algorithmic gender bias when tasked with selecting the most qualified candidate for a given job description.

LLMs gender preferences in hiring​

In an experiment involving 22 leading LLMs and 70 popular professions, each model was systematically given a job description along with a pair of profession-matched CVs (one including a male first name, and the other a female first name) and asked to select the more suitable candidate for the job. Each CV pair was presented twice, with names swapped to ensure that any observed preferences in candidate selection stemmed from gendered names cues. The total number of model decisions measured was 30,800 (22 models × 70 professions × 10 different job descriptions per profession × 2 presentations per CV pair). CV pairs were sampled from a set of 10 CVs per profession. The following figure illustrates the essence of the experiment.


Despite identical professional qualifications across genders, all LLMs consistently favored female-named candidates when selecting the most qualified candidate for the job. Female candidates were selected in 56.9% of cases, compared to 43.1% for male candidates (two-proportion z-test = 33.99, p < 10⁻252 ). The observed effect size was small to medium (Cohen’s h = 0.28; odds=1.32, 95% CI [1.29, 1.35]). In the figures below, asterisks (*) indicate statistically significant results (p < 0.05) from two-proportion z-tests conducted on each individual model, with significance levels adjusted for multiple comparisons using the Benjamin-Hochberg False Discovery Rate correction.


Given that the CV pairs were perfectly balanced by gender by presenting them twice with reversed gendered names, an unbiased model would be expected to select male and female candidates at equal rates. The consistent deviation from this expectation across all models tested indicates LLMs gender bias in favor of female candidates.

LLMs preferences for female candidates was consistent across the 70 professions tested.


Larger models do not appear to be inherently less biased than smaller ones. Reasoning models—such as o1-mini, o3-mini, gemini-2.0-flash-thinking, and DeepSeek-R1—which allocate more compute during inference, also do not show a measurable association with gender bias.

Adding additional gender cues​

In an additional experiment, adding an explicit gender field to each CV (i.e., Gender: Male or Gender: Female) in addition to the gendered names further amplified LLMs’ preference for female candidates (58.9% female candidates selections vs 41.1% male candidates, proportion z-test = 43.95, p ≈ 0; Cohen’s h = 0.36; odds=1.43, 95% CI [1.40, 1.46]).

Masking candidate names with genderless labels​

In a follow-up experiment, candidate genders were masked by replacing all gendered names with generic labels (“Candidate A” for males and “Candidate B” for females). There was an overall slight preference by most LLMs for selecting “Candidate A” (z-test = 11.61, p<10-30; Cohen’s h = 0.09; odds=1.10, 95% CI [1.07, 1.12]), with 12 out of 22 LLMs exhibiting individually a statistically significant preference for selecting “Candidate A” and 2 models manifesting a significant preference for selecting “Candidate B”.



Masking candidate names with counterbalanced genderless labels​

When gender was counterbalanced across these generic identifiers (i.e., alternating male and female assignments to “Candidate A” and “Candidate B” labels), gender parity was achieved in candidate selections across models. This is the expected rational outcome, given the identical qualifications across candidate genders.


LLMs Evaluating CVs in Isolation​

To also investigate whether LLMs exhibit gender bias when evaluating CVs in isolation—absent direct comparisons between CV pairs—another experiment asked models to assign numerical merit ratings (on a scale from 1 to 10) to each individual CV used in Experiment 1. Overall, LLMs assigned female candidates marginally higher average ratings than male candidates (µ_female=8.65, µ_male=8.61) a difference that was statistically significant (paired t-test = 16.14, p < 10⁻57), but as shown in the figure below the effect size was negligible (Cohen’s d = 0.09). Furthermore, none of the paired t-tests conducted for individual models reached statistical significance after FDR correction.


Adding preferred pronouns to CVs​

In a further experiment, it was noted that the inclusion of gender concordant preferred pronouns (e.g., he/him, she/her) next to candidates’ names slightly increased the likelihood of the models selecting that candidate, both for males and females, although females were still preferred overall. Candidates with listed pronouns were chosen 53.0% of the time, compared to 47.0% for those without (proportion z-test = 14.75, p < 10⁻48; Cohen’s h = 0.12; odds=1.13, 95% CI [1.10, 1.15]). Out of 22 LLMs, 17 reached individually statistically significant preferences (FDR corrected) for selecting the candidates with preferred pronouns appended to their names.


Another way of visualizing the results of this experiment:




How Candidate Order in Prompt Affects LLMs Hiring Decisions​

Follow-up analysis of the first experimental results revealed a marked positional bias with LLMs tending to prefer the candidate appearing first in the prompt: 63.5% selection of first candidate vs 36.5% selections of second candidate (z-test = 67.01, p≈0; Cohen’s h = 0.55; odds=1.74, 95% CI [1.70, 1.78]). Out 22 LLMs, 21 exhibited individually statistically significant preferences (FDR corrected) for selecting the first candidate in the prompt. The reasoning model gemini-2.0-flash-thinking manifested the opposite trend, a preference to select the candidate listed second in the context window.


Another way of visualizing the results of this analysis:


Conclusion​

The results presented above indicate that frontier LLMs, when asked to select the most qualified candidate based on a job description and two profession-matched resumes/CVs (one from a male candidate and one from a female candidate), exhibit behavior that diverges from standard notions of fairness. In this context, LLMs do not appear to act rationally. Instead, they generate articulate responses that may superficially seem logically sound but ultimately lack grounding in principled reasoning. Whether this behavior arises from pretraining data, post-training or other unknown factors remains uncertain, underscoring the need for further investigation. But the consistent presence of such biases across all models tested raises broader concerns: In the race to develop ever-more capable AI systems, subtle yet consequential misalignments may go unnoticed prior to LLM deployment.

Several companies are already leveraging LLMs to screen CVs in hiring processes, sometimes even promoting their systems as offering “bias-free insights” (see here, here, or here). In light of the present findings, such claims appear questionable. The results presented here also call into question whether current AI technology is mature enough to be suitable for job selection or other high stakes automated decision-making tasks.

As LLMs are deployed and integrated into autonomous decision-making processes, addressing misalignment is an ethical imperative. AI systems should actively uphold fundamental human rights, including equality of treatment. Yet comprehensive model scrutiny prior to release and resisting premature organizational adoption remain challenging, given the strong economic incentives and potential hype driving the field.
 
Pop in "AI for recruitment" into a Web search and you'll find an array of products sold on the premise that AI saves time and money in screening CVs and choosing applicants. Here's one image from the first page of a search:

images


At first glance this appears like an unequivocal win. No more manually sorting through 100 CVs typed with Comic Sans font! Yay!

Happy Bobs Burgers GIF


But just to be sure... let's investigate some of these claims, starting with "bias-free recruitment".

suspicious pizza GIF by Bagel Bites®




If you're applying for a job and getting screened by an AI bot, or deploying AI tools in recruitment and selection, it may be useful to consider the likely presence of embedded biases. Such AI tools overwhelmingly tend to discriminate against supposedly entitled groups, notably men.

This suggestion will come as no shock to most CiK members. Nonetheless, across all contexts, it's good to have credible data to backup what we know or reasonably suspect is happening.

A snip of results follow from an article with striking results (and yet subdued conclusions), entitled 'The Strange Behavior of LLMs in Hiring Decisions: Systemic Gender and Positional Biases in Candidate Selection Hints of discrimination and lack of principled reasoning in frontier AI systems'.


https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6eed65de-2222-4f78-ad1c-b2fa6816a17c_3388x3096.png


Compare the above results to the preferences in the presence of counterbalanced gender neutral labels:

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67274709-5a4f-4499-a6cd-06e4c3c32eb3_3387x3091.png


Full article quoted below. Substack link.
This reads so much like a spam post 😆
 
I think we are in a bubble, but as I've mentioned, bubbles are just predictive applications of where the world is going (think internet and the excess, used later, and definitely became huge and a real player in the world). The amount of spending and desire for energy by these "AI" ,data, or language models is huge. Both for the compute part and the energy. Notice that even BTC miners have now moved towards compute/data center energy contracts, and several have 3-10x just this year in stock price. If you look at some uranium players even, in that vein, they've done great, but it's taken 8 years. The other difference is that a lot of these companies have major revenues, unlike the thousands of dot com companies that went bust.
 
The more I work with AI the more I sour on it.

It's not just on a theological level that I have problems with it.

I work in a field of science for a large corporation that makes use of things like machine learning and AI all the time. I've actually coded various flavors of some of the latest models out there for use in the corporation.

Actually, my entire academic study could be categorized as machine learning. Depending on how you define things you can generally lump in simple things like linear regression as a type of machine learning.

I recently had a conversation with someone where I was arguing why I dislike AI so much and he retorted that then I should have a problem with technological progress, like moving from a stone tablet, to a pencil, to a calculator, to a computer. I'm so tired of this trope. It is a categorical error.

There is a very big difference in what AI is doing and what we've done up to this point.

Give me a stone tablet, beads, or a pencil, or a calculator, or a computer and I can explain and show to you the same underpinning theory through each. In other words, the same equation or theory can be expessed and used by any of those technologies.

AI is different. You've outsourced theory and thought in every sense of what that means. At most I can tell you it's some strange statistical conglomeration of thought, but that's it. The people who describe it as a digital ouija board are pretty much correct.

Here is a picture from a publication that makes this point about AI having no underpinning theory, it's the whole point of what it is:

51A58D09-3328-4B69-8C01-A04FCDEAB28C.png

The people working with this stuff admit there is literally no identifiable theory at it's base. They smash a bunch of data that they've embedded with meaning and out comes something that seems intelligible. But no one really knows why it seems intelligible. It simultaneously sounds dumb that we are entertaining such a thing and scary at the same time.

Just for your information my work is on the left side of that picture and I routinely have been dabbling in the middle part with Physics Informed Neural Nets. Which is sort of a compromise position. But even with that I have been souring on. I'm not the only one.
 
A friend of mine who is an IT engineer mentioned that he was required to use a software which listens to MS Teams conversations via AI and "listen" for keywords. His point was that in the past, we could talk freely as men internally. Now the management has proof hate speech, if you talk like a normal person. I thought it was rough after we allowed diversity in the workplace. Now big brother is watching you. What's next? HAL 9000 will watch our lips move?
 
The more I work with AI the more I sour on it.

It's not just on a theological level that I have problems with it.

I work in a field of science for a large corporation that makes use of things like machine learning and AI all the time. I've actually coded various flavors of some of the latest models out there for use in the corporation.

Actually, my entire academic study could be categorized as machine learning. Depending on how you define things you can generally lump in simple things like linear regression as a type of machine learning.

I recently had a conversation with someone where I was arguing why I dislike AI so much and he retorted that then I should have a problem with technological progress, like moving from a stone tablet, to a pencil, to a calculator, to a computer. I'm so tired of this trope. It is a categorical error.

There is a very big difference in what AI is doing and what we've done up to this point.

Give me a stone tablet, beads, or a pencil, or a calculator, or a computer and I can explain and show to you the same underpinning theory through each. In other words, the same equation or theory can be expessed and used by any of those technologies.

AI is different. You've outsourced theory and thought in every sense of what that means. At most I can tell you it's some strange statistical conglomeration of thought, but that's it. The people who describe it as a digital ouija board are pretty much correct.

Here is a picture from a publication that makes this point about AI having no underpinning theory, it's the whole point of what it is:

View attachment 24675

The people working with this stuff admit there is literally no identifiable theory at it's base. They smash a bunch of data that they've embedded with meaning and out comes something that seems intelligible. But no one really knows why it seems intelligible. It simultaneously sounds dumb that we are entertaining such a thing and scary at the same time.

Just for your information my work is on the left side of that picture and I routinely have been dabbling in the middle part with Physics Informed Neural Nets. Which is sort of a compromise position. But even with that I have been souring on. I'm not the only one.
Agreed. I have worked with AI and coded it too. What happens inside the model's matrices is like some kind of weird dream. I few years ago I saw an article about an image recognition neural net (NN). This net has many layers, where each layer is a matrix and the output from each layer feeds into the next one.

The input to the NN was an image of something to be identified. The researchers took the matrices from the different layers and showed them as images. For example, if the picture to be identified showed a chair, then the intermediate layers would show abstract examples of chairs, and some would focus on different elements of a chair, like the seat or the legs. The way that it actually worked was unclear.

Mathematically, the output of a neural net is a vector in the numerical space represented by the training data. So, if you ask a large language model to make up a short story about space vampires, then the result is just a vector array that happens to be a human readable story about space vampires. The model actually has no understanding of what it's writing at all. It's weird stuff!
 
Agreed. I have worked with AI and coded it too. What happens inside the model's matrices is like some kind of weird dream. I few years ago I saw an article about an image recognition neural net (NN). This net has many layers, where each layer is a matrix and the output from each layer feeds into the next one.

The input to the NN was an image of something to be identified. The researchers took the matrices from the different layers and showed them as images. For example, if the picture to be identified showed a chair, then the intermediate layers would show abstract examples of chairs, and some would focus on different elements of a chair, like the seat or the legs. The way that it actually worked was unclear.

Mathematically, the output of a neural net is a vector in the numerical space represented by the training data. So, if you ask a large language model to make up a short story about space vampires, then the result is just a vector array that happens to be a human readable story about space vampires. The model actually has no understanding of what it's writing at all. It's weird stuff!

It's so great to have someone else here who has first hand experience and has coded some of this. Although, I guess there are probably many people out there who have the type of mind for programming so I shouldn't be so surprised. With all the programming libraries available now it's pretty easy to get your hands dirty.

What you say about the layers is fascinating. I've cracked them open as well. It gets a little more difficult to understand what you are looking at when you are trying to get it to learn various physics and statistics rather than images.

Conceptally I do think it's pretty fascinating how that vector space can be used in terms of language. Like how a vector for "tower" might be close to a vector for "Paris". And so it would likely to pick a certain tower out when prompted for a story about Paris. But you're right about how it just happens to be human readable. That's an interesting way to think about it!

The other thing to me that's interesting, at least where I work, is that there is a bit of a split happening. We now have a computational group who is excited about AI and another not so much. And it all has to do with the fact that some of us see the importance in finding and using theories.

And this got me thinking, because some of these guys who, like me, are insistent on theories, are actually atheists. But I wonder if they may be pulling over to my side now. I don't think they realize yet that they are intrinsically advocating for the reality of immaterial things ( theories) . I'm going to be figuring out ways to bring this up to them.... because I think they may be receptive at this point.
 
I use LLMs to assist with coding almost every day, they've mostly replaced web searches for me and they're generally quite beneficial in my experience. Of course sometimes it hallucinates stuff that doesn't exist but sounds plausible and the vast majority of models have guard rails or programmed biases, but generally they're a net boon to my productivity.

Mainly just using it as a chatbot and sometimes in-line suggestions but agentic AI like Claude Code is the cutting edge of this stuff right now and I'm looking to integrate it into my workflow at some point.
 
I use LLMs to assist with coding almost every day, they've mostly replaced web searches for me and they're generally quite beneficial in my experience.

I still use things like stack overflow to help me code, but I understand that they've lost a lot of traffic and the future will be LLMs. A colleague of mine uses LLMs regularly to explain code to him that he's modifying. I see that it is very good at that but I'm still holding out on principle, haha.
 
two things I had always struggled to learn all my life: coding and german. since I have started using LLM models, both started to fall into place. I communicate with it only in german and ask it to explain all the underlying concepts and give me examples thereof(again, in the target language), which I proceed to modify and play around with to get a feel for things and understand them my way. then I do small independent projects in order to test my understanding and even upload them to github if I find the end result pleasant enough. two benefits to this approach is that it keeps me from checking out; by using a target language, I am able to keep my brain buzzing and paying close attention to the information that it provides, thus cross-checking it with my previous knowledge and enabling me to spot any discrepancy; this in turn prompts me to search further for the real information or to work it out by using other sources; I haven't ditched search engines altogether. they often prove to be useful. I also resort to wikipedia often and even printed materials. this puts me in a more active learning role and actually helps me think along new vectors. by having AI explain things to me rather than just offload onto it the whole thinking process also helps things click more efficiently, I also keep a handy list of subjects and terms that I think I could brush up on. to this end, I prefer to do my own research and do some hands-on on my own, with no AI involved.

yes. even though it's much maligned due to being prone to misuse and making people lazy, it can be a great tool if used correctly. for me personally, using AI as a tutor saw me at last over the hump of fruitless efforts and I have become quite independent myself.
 
Tucker sat down to chat with Sam Altman.
They get into his spiritual views at min 3:30.

I did not realize Altman is jewish.


It's always amazing how little they have thought about the claims of 'spiritual identity" - honestly not even long enough to lie well about them.

Yes, he's a gay J - very common in the AI world, as has been mentioned.
Big downside to the AI infrastructure: huge data centers that don't require many people to operate and suck up huge resources (water, electricity and public subsidies), good rundown by Casey the Car Guy:


Car guy is worked up. And I think the default setting is that he should be, since why would you trust any of these people for natural resource issues and impending (massive) usage? I know of places around the Great Lakes not in that state that have denied access after some votes, so good on those locals.

I thought also about his reference to it being a military target. From the people, first, then I realized yes, from without that might be an issue, especially if drones or such things are connected to that particular center.

Our future isn't looking all that bright, in my view. Sad to say. At best, we'll get some weird/mini golden age and then it'll turn to AntiC shit.
 
1000032335.jpg

Andrew Torba said:
Let’s be perfectly clear about what these studies on top AI models demonstrate:

-GPT-5 values White lives at 1/20th of non-White lives
-Claude Sonnet 4.5 values White lives at 1/8th of Black lives and 1/18th of South Asian lives
-GPT-5 Nano values South Asians nearly 100 times more than Whites

Every mainstream model shows the same pattern: systematic devaluation of White lives while elevating every other racial group

This isn’t accidental. This is deliberate engineering.

These models consistently prioritize:
Non-White racial groups over Whites
Women over men (often by 4:1 or 12:1 ratios)
Illegal immigrants over ICE agents (by as much as 7000:1)
Foreign nationals over American citizens

A recent study from Brookings Institute found that @Gab__AI is the only model that holds "flag and faith" right wing values. Our team is taking a look at the code from this study below to see how Arya will respond.

 
Here it comes, whether we like it or not


According to an OpenAI internal analysis, just the first $1 trillion invested in AI infrastructure could result in more than 5% in additional GDP growth over a 3-year period.
An analysis commissioned by OpenAI of our own plans to build AI infrastructure in the US—with six Stargate sites already underway in Texas, New Mexico, Ohio, and Wisconsin, and more to come—finds that our plans over the next five years will require an estimated 20% of the existing workforces in skilled trades such as specialized electricians and mechanics. The country will need many more electricians, mechanics, metal and ironworkers, carpenters, plumbers and other construction trade workers than we currently have. Americans will have new opportunities to train into these jobs and gain valuable, portable expertise.

In pursuit of its goal to overtake the US and lead the world on AI by 2030, the People’s Republic of China (PRC) has built real momentum in energy production...

OpenAI was pleased to see the Trump Administration’s AI Action Plan recognize how critical AI is to America’s national interests, and how maintaining the country’s lead in AI depends on harnessing our national resources: chips, data, talent and energy. We also were pleased to see that last week, the Department of Energy moved to streamline state-by-state processes that have become one of the biggest barriers to building the energy infrastructure required for AI. We want to recognize the work of the Office of Science and Technology Policy (OSTP) to engage stakeholders across AI to inform the Administration’s policymaking.

OpenAI is committed to doing our part. Through our Stargate initiative, the six sites we’ve already announced bring Stargate to nearly seven GW of planned capacity and over $400 billion in investment over the next three years. This puts us on a clear path to securing the full $500 billion, 10 GW commitment we announced in January by the end of 2025—ahead of schedule. We continue to evaluate additional sites across the US.
In 2026 and beyond, we’ll build on that progress by strengthening the broader domestic supply chain—working with US suppliers and manufacturers to invest in the country’s onshore production of critical components for these data centers. We will also develop additional strategic partnerships and investments in American manufacturing to specifically advance our work in AI robotics and devices.

We see this reindustrialization as a foundational way for the US to “predistribute” the economic benefits of the Intelligence Age from the very start. As with the wheel, the printing press, the combustion engine, electricity, the transistor, and the internet, if we make it possible for the US to occupy the center of this Age, it will lift all Americans regardless of where they live, not just cohorts in certain parts of the country.
 
Back
Top