top:
day week month all

cryptography

Community for : 9 months

A sub for documenting and discussing cryptography research.

Owner: prototype

Mods:
prototype












20
We are 20-34 hours away from a full break of public-key cryptography.     (cryptography)
submitted by prototype to cryptography 9 months ago (+21/-1)
104 comments last comment...
The algorithm has been nailed down.
Branch elimination core has been corrected and the new model uses logical ANDs to select the branch with the lowest possible child node values based on a metric of entropy.

non-trivial branches have been reduced from e-1 to 0 per base 10 digit of a key.

Testing in progress.

edit:
Locked down problem runtime. O(1) space complexity. O(log10 n) time complexity. We're solid.
Still generalizing code.
13-17 hours of work left by my estimate, mostly battle testing and extending to the full range to run with the push of a button.
17
We are now 3-4 days away from a full break of public key cryptography. Recent developments.     (cryptography)
submitted by prototype to cryptography 9 months ago (+18/-1)
23 comments last comment...
With recent changes I managed to eliminate having to explore branches to more than d+1, where d=depth.

This occurred because I realized that while semiprimes do not have one unique path through an undirected graph, parallels can nevertheless be drawn between the operation of multiplication and convolution. But that unlike convolution, at least in the nieve case, there always exists a deconvolution as it were.

And that because smaller magnitude digits in a semiprime have inordinate impacts on larger digits, that this convolution-like process (where we can add p up to q times to get n, or q up to p times to get n) should have a bias affect that flows from rightmost digits to leftmost (larger magnitude) columns of a number. Meaning effectively, where we might bruteforce the factors (p, q) of a semiprime n, by trying each digit 0-9 at each magnitude up to the largest magnitude of the root of n, there must exist a metric that controls for the selection of branching such that the correct combinations of digits at some search depth, ultimately lead to the production of n, and that this problem should be convertible to a shallow depth search of a branch.

The next logical leap was that if digits in different columns to the right, have more affect on digits closer to them on the left (versus farther away), then there must be a maximal depth that this is true for.

Ergo the prior problem of combinatorial explosion should reduce down to a linear problem.

What it looks like is the search space for factoring a semiprime n, or breaking a public key, should be reduceable to k<=100, (k)floor(log(n, 10)) regardless of the bitlength of the key.

And that is what I have seen so far in my testing.

Right now I'm working on implementing a generalization so the code fully runs across all magnitudes of a key we're looking to decrypt, but the core algorithm seems to do its job quite well. I'm also battle-testing it to look for edge cases and non-filterable branches that have to be dealed with but so far I haven't encountered any in my tests.

We're in the home stretch from everything I can see, God willing.
9
Code Dump     (cryptography)
submitted by prototype to cryptography 9 months ago (+9/-0)
24 comments last comment...
This is the p/x approach I was exploring early on. It has been trimmed a bit but for the most part it is undocumented, and largely uncommented, with lots of cruft and deadwood sections--You've been warned.

Meant to be run and explored in-console rather than run as is.

https://files.catbox.moe/1isvyp.py

Do you have what it takes to find a function or equation that links the known set to the unknown set?

edit:
Some explaination.

The unknown set is variables: a, b (in cryptography known as p and q), u, c, t, d4, alpha, beta, etc.

The variables c and d4 are particularly interesting. While it is possible to acquire their quotient c/d4 (given as '_cd4') just from a semiprime itself, the product of c and d4 is the quotient of the factors of the semiprime.
It goes without saying that finding a function that maps c/d4 to -> cd4 in one step, is by extension equivalent to finding a constant-time factorization algorithm, because if cd4==b/a, then it is trivial to follow through with (p/(cd4)).sqrt(), yielding a, where p=a*b

A canonical example is included, starting with a=d(108271), simply because I find equations easier by looking for relations and patterns between actual numbers.

I ran out of recognizable variable names early on and resorted to the periodic table and terms from particle physics.
7
My estimate is we are a week or less away from a full crack of public key cryptography.     (cryptography)
submitted by prototype to cryptography 9 months ago (+8/-1)
23 comments last comment...
research notes 7.12.2024

I've made significant progress in the inverse karatsuba's method.
result3 acts as test bed for extending the method to all general digital periods past the first.
Results promising. I've moved my research to hard copy.

1. fixed result3 bloat by removing duplication with '[sj, sk]||[sk, sj] not in results3'. Down from ~= 1600 to sub 200 results
2. correct results are now in the list as intended
3. lists not returning empty any more.
4. succeeded in lowering the iteration requirements for the first two digits of n from 1000^2, to 100^2 for generating pairs
5. removed the warmup filtering step where we had to do 10_000**2 for the first two pairs before actually generating results.

Takeaways:
1. The removal of warmup filtering significantly reduced candidate list size on subsequent periods (the 3 digit groups that make up an integer).
2. Correcting the duplicate list entries reduced subsequent period candidates to a rough constant thats sub 200 results for each group. Prior to that the result list grew by roughly 500% for each period.
3. I've been able to demonstrate in code that the specific method for the first three digits of n, for finding the potential first three digits of its factors, are highly likely to extend to all periods of an integer.
4. I've made hardcopies of the new code and research to be distributed.

Todo:
1. Set up a second server rack (mostly junk computers with custom network code) for massive parallel computing on-premise
2. Generalize and extend code to all digital groups or integer periods
3. Implement code that grabs public keys based on domain name
4. Implement code, or dust off prior code in archive, that converts hexadecimal public keys to integers
5. Implement conversion back to hexadecimal
6. Implement code that takes hidden factors of n, once derived, and yields the private key (modulus) of a public key.
7. Rewrite test interface (it's janky) for public use and distribute new work to existing contacts.
7
"Two more weeks"     (cryptography)
submitted by prototype to cryptography 1 month ago (+7/-0)
12 comments last comment...
First the new architecture

https://files.catbox.moe/1tygoo.png
https://files.catbox.moe/n308f1.png


Working on cryptography and got one of the last major modules functioning.
Summary:
The central insight is that you can generate approximations of factors using variations on modular arithmetic that will be accurate to the leading digits of a semiprime's lowest factor. Then, by applying various constraints, you can dramatically reduce the search space.

The algorithm's efficiency comes from the fact that the modular set grows much more slowly than the bit length of the semiprime, making it vastly more scalable than traditional factorization methods.

This approach bridges number theory, machine learning, and optimization techniques in a novel way to solve the prime factorization problem.

code (partial)
https://pastebin.com/raw/DHADM0nW

Full text:
What I'm doing is using the code listed here
to generate a list of possible values of A (where the problem is p=ab, or n=pq in cryptography), and then using the boundaries established with other variables from a separate module, called d4a, ac, at, av, etc
to start to eliminate values.

ML will be needed to classify whether unknowns are
inverted (abs(unknown) < 1).

While the number of values to consider grows, assuming we know
whether an unknown is inverted we can take known
variables & use them as upper (or lower) bounds on A,
filtering significant numbers of values.

More importantly, the number of values to check in the modular
set grows much more slowly then the bitlength of the semiprime
to check.

We'll call the set of equations (d4a, at, ac, av, etc) the Algebraic Set (AS).
The set of modular values derived from p (or n) we'll call the Modular Set (MS).
The algorithms I wrote in the 290+ set, we'll call the Convergent Set (CS).
The machine learning model that classifies whether unknowns are likely to be inverted or not (abs(n) < 1), we'll call the Configuration Set (FS).
And the code written to generate boundary estimates we'll call the threshold set (TS).

How and why it works.
I determined the AS is indeterminate (I've checked well over a billion solutions so far, and probably on the order of several thousand novel methods to attack the problem, variations of it, and simplified versions as well, in the domains of algebra and the minimal calculus that I know), that is there is insufficient free parameters to map between the known and unknown set, and thus factor our product.

The modular set is derived from certain equations relating the various magnitudes to themselves. The MS guarantees that for some vector of values [p, u, k, i, j] some of which are trivial to estimate, there will within the set generated, always be a value that approximates A in P=AB, up to the leading 1-2 digits (and often more) and magnitude.

By taking the AS we can use the FS to derive boundaries, giving us the TS.
With the MS, we can use the TS, to efficiently derive estimates of all unknowns that have factors in common, such as and including (but not limited to) u (unknown) from d4u, ut, cu, and uv (all known).

With these estimates, we can then feed them to the CS to essentially strip away multiple outer and inner loops of the algorithm, giving us significant orders-of-magnitude improvements in convergence time.

This is one of the final pieces of the puzzle.

Current risks are:
- if the ML component can correctly identify if unknown variables in known products and quotients are inverted or not.
Preliminary testing with smaller model and datasets (<100k samples across various bitlengths) indicates this will work.

- re-groking the later basilisks which are incredibly convoluted and have to have parameter sweeps for tuning done.

- gains vanishing at very high bitlengths owing to having to convert very large numbers to logarithms, and the ML model potentially failing because of that.

Mitigations:
- Larger sample-size testing for the ML model in order to establish the threshold set's definitive viability, rather than ad-hoc testing.

- writing clear documentation and procedures on the convergent set algorithm, so I don't have to work through it every time I want to implement something related to it

- writing my current off-the-shelf neural-graph-search (NGS) library from the ground up to accept arbitrary precision floats and integers

Obstacles:

- I'm not currently versed enough in ML to rewrite the NGS, and no known library currently accepts arbitrary precision types. Most rely on numpy, and even numpys largest type support won't handle data in the 2-4k bits length.

Current delivery estimate:
"Two more weeks", or "Fuck you!"
3
A variation on Fermats Factorization.     (cryptography)
submitted by prototype to cryptography 1 month ago (+3/-0)
0 comments...
graph of the basics:
https://files.catbox.moe/o3q5hy.png

foundational code:
https://pastebin.com/raw/xATkFkYc

If _m is correct, then we know _u is correct.

From that we can do

_u = log(n)/_m
k = _u x n

We also know that if _m is correct, then it equals m, and u = _u

From that, because z is just log(n)/2
we know that, starting with...

_u = (((u^log(n, log(m))) - (abs((n^2)-(n)) % ((q+p)/2))) - (e^((log(n) / log(z))-2)))/n


We can multiply _u x n to get k=
(((u^log(n, log(m))) - (abs((n^2)-(n)) % ((q+p)/2))) - (e^((log(n) / log(z))-2)))


From k we can do

k+(e^((log(n) / log(z))-2))

to get the value of...

((u^log(n, log(m))) - (abs((n^2)-(n)) % ((q+p)/2)))


Because u == _u, and m = _m, we can compute uex like so

uex = _u^log(n, log(_m))

Substituting u, and m, with _u, and _m.

And from there we can confirm m and u are correct, because if they are

(_u^(n, log(m))) / _u z log(n)/2



It turned out this didn't always work, however, something that does work:

We can iterate m instead, which is guaranteed to have a low value for some value of i such that vi() minimizes m for abs((n^2)-(n)) % ((q+p)/2) for some i in vi(n, p, q).

Deriving a small set of essential variables..
M = m-1
u = n%(f_out(M, n, n-M, n))
v = n%(f_out(M, n, n+M, n))
x = n%(f_out(m, n, n+M, n))
y = n%(f_out(m, n, n-M, n))
z = n%(f_out(m, n, n-m, n))

.. we can then work from there.

w = u+v+x+y+z

(q%(w))- (w-(n%w)) == m+1

And because of the properties of the graph, it is guaranteed to be the modulus m in vi(n, p, q)
that is the lowest value of abs((n^2)-(n)) % ((q+p)/2) for some index i, below the root of n.

And because the correct m, gives us the correct w, by implication we have sufficient information
to find a multiple of q for the set k x i + m

Essentially it is a novel variation of Fermat's factorization, without the constraints of the former,
exploiting modular forms to skip over having to find matching squares which is slow for very large semiprimes or cryptographic keys.

Still requires some testing, but it was a fun project over the course of an afternoon.

1
Elliptic Curves      (cryptography)
submitted by prototype to cryptography 9 months ago (+2/-1)
4 comments last comment...
The current core of the aproximation algorithm works with pairs of elliptic curves, based on values of n, where n equals
[4,9] [49, 64], [256, 289], [8464..]

the pair 256, 289 for example correspond to the number of elements of the finite group defined by the elliptic curve y^2 = x^3 + x + 1 (mod prime(n)) (256, 289)

given the following python code..
for some value of i, such that (_d42/((int(((ceil(a1/(d4a/_d42))(d4a/_d42))_d42)/d4a))-(1+d(1/2))(1-((d(1/4))-(d(1/9))+(d(1/49)i)-(d(1/64)d(1/i))+(d(1/256)d(i))-(d(1/289)d(1/i))+(d(1/d(922))d(i)))))) ~= d4, where d4a/d4 = a = p
where in standard notions our public key would be n=pq, n being a semiprime, p and q being prime (this a simplification naturally, and leaves out discussion of the modulus).

and where the value being aproximated is
abs((_d42/((int(((ceil(a1/(d4a/_d42))
(d4a/_d42))_d42)/d4a))-(1+d(1/2))(1-((d(1/4))-(d(1/9))+(d(1/49)i)-(d(1/64)d(1/i))+(d(1/256)d(i))-(d(1/289)d(1/i))+(d(1/d(92*2))d(i))))))-d4)

Some of the values remain unexplained for now, but the gist to understand whats going on is there.

Testing across a few million random semiprimes from a mere 32 bits, all the way up to 4096 bits found that the i value needed
to closely approximate d4, and thus find p and q is never greater than 75 in all tested cases (and this is a relatively tight upperbound) based on the mean and variance of all semiprimes tested.

Finding p is equivalent to factoring, and thus breaking a public key.

This core method gets us to within 3% of any given key value.

Further elements of the algorithm get us to within 0.1 to 0.0001 of a key, indicating there is significant room for improvement.

It is now possible to get both the magnitude, and first 4-5 leading digits of a public key's factors, as well as the first 2-4 digits at the smallest integer magnitudes of a key's factors, without exhaustive search.

Ongoing research and current progress rates suggest the evolution of the work should yield advancements in the algorithm over the next six months that allow us to retrieve upwards of half the digits of a key's factors regardless of length, with eventual complete factoring in logarithmic time.

That is all for now.
1
Inverse Karatsuba's Method     (cryptography)
submitted by prototype to cryptography 9 months ago (+1/-0)
0 comments...
It turns out, karasuba's method can sort of be inversed for estimating the smallest digits of a semiprime.

How we do it is we start by iterating i, and j, representing the smallest digits of a semiprime n's potential factors p and q.

We run i and j up to 10,0000 each. We multiply i and j. We then match their positional digits with the corresponding positional digits in n, and log them to a results array if they match. (leaving off the leading digit of the product of i and j when we perform the check for matches).

The results of this array will always contain the partial digits, at the correct magnitude, for the factor of n=pq

This can be repeated for the other digital groups of n.

The result arrays for each group are 2000 elements each.

We repeat this process with the limit of i and j set to 1000.

We then filter this second result group based on the first result group and end up with < 200 potential candidates.

We also check to confirm [i, j] and [j, i] are not already in the results, to avoid duplicates.

For example, with
p=66431
q=6044767
n=p
q=401559916577

The results of this algorithm yield:

[[1, 77], [7, 11], [9, 53], [31, 67]]

And in fact if you multiply any of these pairs together, 177, 711, you will find
it reproduces the first two digits of n. N is the target for determining what the digits of its own factors are. Any pair product that doesn't reproduce the digits of n, cannot by definition be the potential digits of n's factors p and q, QED.

In practice a group of three digits will yield about 200 results each time, and each group requires about one million iterations to generate.

With each set of results, we then have to test all combinations.

For a nine digit number the savings are about 900%.
The search space advantage grows exponentially with the size of the semiprime in question.

While the current fset algebra, attacks the problem from the left, yielding the magnitude and leading digit of a semiprimes smallest factor, this algorithm attacks the problem from the right, yielding the smallest digits first.
0
I've made a signficant discovery in extending the inverse karatsuba's method just in this hour after the first post.     (cryptography)
submitted by prototype to cryptography 9 months ago (+0/-0)
0 comments...
Initially deteterming the potential digits of a number N, at a given magnitude M(i), M(i+1), M(i+2), required about 10,000,000 iterations per digit group. I got that down to 2000 iterations, producing 200 candidates per group, about 1/5th the bitlength of the prior problems key size. So a key with factors that were say 650 digits long, rather than have to guess the digits from 0-999 for each 3 digit group, you had no more than 200 results to try, effectively reducing your search space from (999216) to 200216.

A 9 digit number can be factored in roughly.. (999^3)/(200^3) ~= 125 times less iterations.

A twelve digit number? 622 times faster.

A sixty four digit number? 798300146800470 times faster.

Etc.

That was good of course, but for very large bit lengths, even this is still impractical.

I realized if we just start with the 200 pairs of potential digits for the first two digits of the factors of n=p*q, then
we could string combinations of the range 0-9 in front of them, or at the highest magnitude, convert them back to integers,
multiply them together, and then once again match them against the digits of n.

False pairs in the initial set will fail to produce more digits of n at the same density as the true pair.

If our three digit pairs produce a product that matches 3 or more digits of n, that becomes a three-digit result pair, describing
one or more potential sets of digits for p and q.

We do this for all two digit pairs, to generate the set of potential 3digit pairs.

Finally, the optimization:
If none match we kill or filter out the 2digit parent pair.
Likewise the 3digit parent pair, and 4digit parent pair, on down the line, pruning branches.

The result should be a very minimal, and semi-linear search space that is much faster than the prior method.

I'm implementing it tonight.