diff options
Diffstat (limited to 'SI/Resource')
166 files changed, 2001 insertions, 0 deletions
diff --git a/SI/Resource/.gitkeep b/SI/Resource/.gitkeep new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/SI/Resource/.gitkeep diff --git a/SI/Resource/Blockchain & Cryptography/02_intro_to_cryptography.pdf b/SI/Resource/Blockchain & Cryptography/02_intro_to_cryptography.pdf Binary files differnew file mode 100644 index 0000000..04639ee --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/02_intro_to_cryptography.pdf diff --git a/SI/Resource/Blockchain & Cryptography/03_chaumian_ecash.pdf b/SI/Resource/Blockchain & Cryptography/03_chaumian_ecash.pdf Binary files differnew file mode 100644 index 0000000..a55def3 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/03_chaumian_ecash.pdf diff --git a/SI/Resource/Blockchain & Cryptography/04_bitcoin_public_ledger.pdf b/SI/Resource/Blockchain & Cryptography/04_bitcoin_public_ledger.pdf Binary files differnew file mode 100644 index 0000000..45e2736 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/04_bitcoin_public_ledger.pdf diff --git a/SI/Resource/Blockchain & Cryptography/05_blockchain_and_consensus_by_POW.pdf b/SI/Resource/Blockchain & Cryptography/05_blockchain_and_consensus_by_POW.pdf Binary files differnew file mode 100644 index 0000000..79f8c70 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/05_blockchain_and_consensus_by_POW.pdf diff --git a/SI/Resource/Blockchain & Cryptography/06_bitcoin_scripts.pdf b/SI/Resource/Blockchain & Cryptography/06_bitcoin_scripts.pdf Binary files differnew file mode 100644 index 0000000..8a4ee3a --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/06_bitcoin_scripts.pdf diff --git a/SI/Resource/Blockchain & Cryptography/Midterm Review - CS646.md b/SI/Resource/Blockchain & Cryptography/Midterm Review - CS646.md new file mode 100644 index 0000000..5658631 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Midterm Review - CS646.md @@ -0,0 +1,309 @@ +--- +id: Midterm Review - CS646 +aliases: + - Guideline +tags: [] +--- + +# Guideline + +- ProctorU +- 10-15 questions + +## History of money + +- Historical definition of money + - A medium of exchange + - A store of value + - A unit of account +- Monetary system: is protocols for controlling the supply of money + - Money, payment system, monetary policy + - banking/financial system +- Commodity vs fiat money + - Commodity: + - Commodity money derives its value from the commodity out of which it is + made. Its value is determined by the value of the material from which it + is crafted. + - Pros: **Intrinsic Value**, **Limited Supply** + - Cons: **Weight and Portability**, **Storage and Security**, **Dependence + on Commodity Value** + - Fiat: + - Fiat money doesn’t have intrinsic value; instead, it derives its value + from a government decree. It is not backed by a physical commodity but is + established as money by government regulation or law. + - Pros: **Lightweight and Portable**, **Flexible Monetary Policy**, **Not + Tied to Commodity Fluctuations** + - Cons: **No Intrinsic Value**, **Inflation Risk** +- M1 vs M2 + - M1: is the most liquid measure of the money supply and includes assets that + are directly suitable for transactions. + - **Components**: + 1. **Currency in circulation**: This includes all coins and paper money + that are in the hands of the public, not held by the government, banks, + or other financial institutions. + 2. **Demand deposits**: These are checking accounts or other types of bank + accounts from which deposited funds can be quickly withdrawn without + any restrictions. + 3. **Traveler's checks**: Although their use has diminished significantly + with the rise of digital payment methods, traveler's checks are + traditionally a part of M1. + 4. **Other checkable deposits (OCDs)**: This can include accounts like + Negotiable Order of Withdrawal (NOW) accounts. + - M2: includes all components of M1 plus near-money assets, which are + financial assets that can easily be converted into cash or are already in a + form that can be used for spending. + - **Components**: + 1. **All components of M1**: Currency in circulation, demand deposits, + traveler's checks, and other checkable deposits. + 2. **Savings accounts**: These accounts, including money market deposit + accounts, generally allow consumers to earn interest on their deposits, + but they might have some limitations on the number of transactions per + month. + 3. **Time deposits**: This includes Certificates of Deposit (CDs) with a + value under $100,000. CDs are deposit accounts with a specific maturity + date and typically pay higher interest than regular savings accounts. + 4. **Money market mutual funds (retail)**: These are funds that typically + invest in short-term debt securities and offer a return to their + holders. They're relatively liquid and can be quickly converted to + cash, though they are not as liquid as checking or savings accounts. + +## Intro to Crypto + +- 1-way hash (properties, attacks): is a mathematical algorithm that takes an + input (or 'message') and produces a fixed-size string of bytes, typically a + digest. + - Properties: + - One-wayness: Given an output, it is infeasible for any one to find an + input document which is hashed to that specific output + - Collision resistance: it is infeasible for any one to find two or more + input documents which are hashed to the same condensed output + - Type 1: + - An attacker has one input document in hands + - He attempts to find a different input document that is hashed to the + same output as is the first input + - Type 2: + - An attack is not given any specific input document + - His goal is find a pair of different input documents that are hashed + to the same output + - A good 1-way hash should be immune to both types of attacks + - Attacks: + - Brute force attack: + - Taking advantage of “birthday paradox” + - This is a long route, or the worst case scenario from an attacker’s + point of view, but the best case scenario from the designer’s point of + view + - Attacks using short cuts: + - Identifying inherent weaknesses of a 1-way hashing + - The nightmare scenario from a designer’s point of view + - Birthday Attack: + - Based on the birthday paradox, in a set of randomly generated hash + outputs, there's a good chance that two of them will be the same. + - The expected # of messages to be hashed before a coincidence/collision + occurs is $1.25 * 2^{t/2}$ +- Public key encryption: is a cryptographic system that uses two keys: a public + key, which is shared openly, and a private key, which remains confidential to + the owner + - RSA Public key encryption + - Key Generation: + 1. $n=p×q \quad n,p \; prime$ + 2. $\varphi(n)=(p−1)(q−1)$ + 3. $1<e<\varphi(n)$ and $gcd(e,\varphi(n))=1$ + 4. $e * d \equiv 1*mod(\varphi(n))$ + 5. The _public key_ is $(e,n)$, and the _private key_ is $(d,n)$. + - Encryption: + - $c = m^e \;(mod\,N)$ + - Decryption: + - $m = c^d \;(mod\,N)$ +- Digital signature + - DSS : A US government standard for digital signatures. It includes + specifications for the DSA (Digital Signature Algorithm). + - Public Key = $(y,p,q,g)$ + - $p$ = a prime of at least 2048 bits. + - $q$ = a 160-bit prime divisor of $p - 1$. + - $g = h^{(p-1)/q}\;mod\;p$, where _h_ is any integer with $1 < h < p - 1$ + s.t. $h^{(p-1)/q}\;mod\; p > 1$. That is _g_ has order $q\;mod\;p$. + - $y = g^x \;mod\;p$, where _x_ is an integer randomly selected from $[1, + q - 1]$ + - Private Key = $x$ + - Relationship: $y = g^x \;mod\;p$, $\;0 < x < q$ + - Signing a document m + - Pick at random 𝒌 from $[1, q - 1]$. + - $r$ = $(g^k \;mod\;p)\;mod\;q$ + - $s = (k^{-1} ∗ (HASH(m) + x ∗ r)) \;mod\;q$, where HASH is a 1-way hash + whose output is 160 bits + - Signature on $m$ = $(r,s)$ + - Verification: + - Fetch Alice’s public key $(y,p,q,g)$ from the public key file + - $w = s^{-1} \;mod\;q$ + - $a = HASH(m) * w \;mod \;q$ + - $b = r*w \;mod\;q$ + - $v = (g^a*y^b \;mod\;p)\;mod\;q$ + - OK only if $v = r$ + - **When 𝒑 is large (say ~700 digits), the best known algorithm for the + _Discrete-Logarithm Problem_ takes an _infeasible_ amount of time (the + problem is computationally infeasible).** + - ECDSA: A variant of the Digital Signature Algorithm (DSA) that uses elliptic + curve cryptography. + - ![[CleanShot 2023-10-05 at 10.03.23 1.png]] + - ![[CleanShot 2023-10-05 at 10.03.34 1.png]] + - ![[CleanShot 2023-10-05 at 10.03.45 1.png]] + - Schnorr: + - ![[CleanShot 2023-10-05 at 09.39.10.png]] + - ![[CleanShot 2023-10-05 at 09.40.10.png]] + - EC-Schnorr + - ![[CleanShot 2023-10-05 at 10.13.48.png]] + - EC-Schnorr is better than Schnorr + - Admit + - Security proofs + - Easy to implement threshold signature + - Blind signature + - Free of signature malleability + - Cleaner and slightly faster: Shorter Key and Signature Lengths +- Malleability of ECDSA + - ![[CleanShot 2023-10-05 at 10.12.51.png]] +- Multi-signature + - Schnorr multi-signature + - ![[CleanShot 2023-10-05 at 09.47.37.png]] + - ![[CleanShot 2023-10-05 at 09.47.51.png]] + - ![[CleanShot 2023-10-05 at 09.49.08.png]] + - ![[CleanShot 2023-10-05 at 09.50.45.png]] + - ![[CleanShot 2023-10-05 at 09.51.01.png]] + - ![[CleanShot 2023-10-05 at 09.56.20.png]] + - ![[CleanShot 2023-10-05 at 09.56.52.png]] + - ![[CleanShot 2023-10-05 at 09.57.08.png]] + - EC Schnorr multi-signature + - ![[CleanShot 2023-10-05 at 10.19.29.png]] + - ![[CleanShot 2023-10-05 at 10.19.34.png]] + - ![[CleanShot 2023-10-05 at 10.19.54.png]] + - ![[CleanShot 2023-10-05 at 10.20.04.png]] + - ![[CleanShot 2023-10-05 at 10.20.12.png]] + - ![[CleanShot 2023-10-05 at 10.22.39.png]] + - ![[CleanShot 2023-10-05 at 10.22.48.png]] + - ![[CleanShot 2023-10-05 at 10.22.54.png]] + +## Chaumian cash + +- 3-party model + - Bank: Withdraw (creation of cash) to Customer + - Customer: Payment to Shop + - Shop: Deposit (end of life of cash) to Bank + - ![[CleanShot 2023-10-05 at 10.29.17.png]] + - ![[CleanShot 2023-10-05 at 10.29.24.png]] +- RSA blind signature + - ![[CleanShot 2023-10-05 at 10.34.02.png]] + - ![[CleanShot 2023-10-05 at 10.34.08.png]] + - ![[CleanShot 2023-10-05 at 10.34.16.png]] +- Public and private keys used in Chaumian + - ![[CleanShot 2023-10-05 at 10.38.05.png]] +- "Cut and choose" technique + - ![[CleanShot 2023-10-05 at 10.55.34.png]] +- Detection of double spending + - Hiding ID in a coin: Force the customer to embed her ID/account # into + coins, using secret sharing technique + - Choose s and t such that $ID = s\bigoplus t$ + - ![[CleanShot 2023-10-05 at 11.14.47.png]] + - ![[CleanShot 2023-10-05 at 11.14.52.png]] +- Withdrawal + - ![[CleanShot 2023-10-05 at 11.20.12.png]] +- Payment + - ![[CleanShot 2023-10-05 at 11.22.12.png]] +- Deposit protocols + - ![[CleanShot 2023-10-05 at 11.24.06.png]] + +## Public Ledger + +- Blockchain +- Transactions + - Key components of a TX + - TX ID +- Examples of TXs +- Bitcoin network + - Centralized + - Decentralized + - Distributed +- How to explain Bitcoin to your grandma? + - Bitcoin is a decentralized digital currency, without a central bank or + single administrator, that can be sent from user to user on the peer-to-peer + bitcoin network without the need for intermediaries. Transactions are + verified by network nodes through cryptography and recorded in a public + distributed ledger called a blockchain + +## Consensus through proof of work + +- Merkle hash tree: is a data structure used in computer science and + cryptography. It is a tree in which every leaf node is labelled with the + cryptographic hash of a data block, and every non-leaf node is labelled with + the cryptographic hash of the labels of its child nodes. This hierarchical + structure provides an efficient way to verify data and is commonly used in + blockchain systems and peer-to-peer networks. + - Pros: + - Easy to detect modification to TXs + - Easy to prove inclusion or membership of TX + - Enhanced privacy +- Block header (80 bytes) + - Version + - Hash of previous block + - Merkle root + - Time + - Target threshold + - None +- Hash puzzle for mining/proof of work + - Target, difficulty, hashrate etc +- ## Coinbase: is to create new bitcoin by a miner +- Thwarting double spending: + - TXs are public & broadcast globally + - Miners always **extend the longest chain** when forking occurs + - **Block** in a shorter branch will eventually become. **orphaned** & be + abandoned + - Some TXs in the orphaned blocks **become unconfirmed**; will be picked up + by miners for confirmation in the longest chain +- Distributed mining +- Pooled mining + - Why: + - Due to the increased level of difficulty, mining individually using home + computers has become ineffective + - Pooling together computing power increases chances of success + - How + - Divide-and-Conquer + - Cut-up the space of nonces into sub-spaces + - search for the right nonce in a distributed manner + +## Bitcoin scripts + +- 2 parts of a script + - Public key script (scriptPubKey) in an unspent TX output (UTXO) received in + the past + - Akin to 1-time "Account #" + - Set conditions for ownership + - Locking UTXO to owner + - Signature script (scriptSig) in an input of a current TX when spending + - Proof of ownership + - Ensuring integrity of TX + - Unlocking UTXO by owner +- 3 types of addresses + - Pay to Public Key Hash (P2PKH): _most popular_ + - Alice the recipient prepares a public-private key pair. She gives the hash + value of her public key to a sender as an address to receive funds. + - Pay to Public Key (P2PK): _natural but an address could be longer than + P2PKH_ + - Alice the recipient prepares a public- private key pair. She gives her + public key to a sender as an address to receive funds + - Pay to Script Hash (P2SH) Towards programmable cash: _most powerful, make + "programmable cash" possible_ + - Alice the recipient prepares the script. She gives its hash value to a + sender as an address to receive funds. +- How a script is executed (stack, left to right, push & pop) + - Script is stack based, forth-like, and written in "reverse Polish notation" + - Scan from left to right + - If it's data, push it onto the stack + - If it's an operator, pop one or more items off the stack, operate on them + and push the result back into the stack + - ![[CleanShot 2023-10-05 at 19.17.13@2x.png]] +- Scripts for threshold multi signature + - ?? +- Common Bitcoin Opcodes/Commands: + - OP_DUP, OP_HASH160, OP_EQUALVERIFY, OP_CHECKSIG, + - ![[CleanShot 2023-10-05 at 19.30.19@2x.png]] + - OP_2 - OP_16: + - ![[CleanShot 2023-10-05 at 19.28.57@2x.png]] + - OP_CHECKMULTISIG, OP_EQUAL diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.39.10.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.39.10.png Binary files differnew file mode 100644 index 0000000..7b8ad90 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.39.10.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.40.10.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.40.10.png Binary files differnew file mode 100644 index 0000000..6c21655 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.40.10.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.37.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.37.png Binary files differnew file mode 100644 index 0000000..6341f11 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.37.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.51.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.51.png Binary files differnew file mode 100644 index 0000000..8e313b5 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.47.51.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.49.08.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.49.08.png Binary files differnew file mode 100644 index 0000000..bd13ce7 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.49.08.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.50.45.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.50.45.png Binary files differnew file mode 100644 index 0000000..58276b8 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.50.45.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.51.01.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.51.01.png Binary files differnew file mode 100644 index 0000000..21b60b5 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.51.01.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.20.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.20.png Binary files differnew file mode 100644 index 0000000..9a46c81 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.20.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.52.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.52.png Binary files differnew file mode 100644 index 0000000..fbfab60 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.56.52.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.57.08.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.57.08.png Binary files differnew file mode 100644 index 0000000..433b392 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 09.57.08.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23 1.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23 1.png Binary files differnew file mode 100644 index 0000000..25a5b58 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23 1.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23.png Binary files differnew file mode 100644 index 0000000..25a5b58 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.23.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34 1.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34 1.png Binary files differnew file mode 100644 index 0000000..c62d235 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34 1.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34.png Binary files differnew file mode 100644 index 0000000..c62d235 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.34.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45 1.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45 1.png Binary files differnew file mode 100644 index 0000000..8a53593 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45 1.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45.png Binary files differnew file mode 100644 index 0000000..8a53593 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.03.45.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.12.51.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.12.51.png Binary files differnew file mode 100644 index 0000000..bc5ed34 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.12.51.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.13.48.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.13.48.png Binary files differnew file mode 100644 index 0000000..c48b228 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.13.48.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.29.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.29.png Binary files differnew file mode 100644 index 0000000..3b4db81 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.29.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.34.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.34.png Binary files differnew file mode 100644 index 0000000..a6ec2d8 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.34.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.54.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.54.png Binary files differnew file mode 100644 index 0000000..d73ff00 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.19.54.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.04.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.04.png Binary files differnew file mode 100644 index 0000000..2479351 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.04.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.12.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.12.png Binary files differnew file mode 100644 index 0000000..b9ace18 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.20.12.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.39.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.39.png Binary files differnew file mode 100644 index 0000000..3fdbe8b --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.39.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.48.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.48.png Binary files differnew file mode 100644 index 0000000..9405e91 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.48.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.54.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.54.png Binary files differnew file mode 100644 index 0000000..aa7a339 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.22.54.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.17.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.17.png Binary files differnew file mode 100644 index 0000000..32c41b2 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.17.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.24.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.24.png Binary files differnew file mode 100644 index 0000000..64a7037 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.29.24.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.02.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.02.png Binary files differnew file mode 100644 index 0000000..4f947fb --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.02.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.08.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.08.png Binary files differnew file mode 100644 index 0000000..f3d2f14 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.08.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.16.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.16.png Binary files differnew file mode 100644 index 0000000..ba17e51 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.34.16.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.38.05.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.38.05.png Binary files differnew file mode 100644 index 0000000..168703f --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.38.05.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.55.34.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.55.34.png Binary files differnew file mode 100644 index 0000000..4d44a17 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 10.55.34.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.47.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.47.png Binary files differnew file mode 100644 index 0000000..a25e008 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.47.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.52.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.52.png Binary files differnew file mode 100644 index 0000000..a3b2fa2 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.14.52.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.20.12.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.20.12.png Binary files differnew file mode 100644 index 0000000..f7f65dd --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.20.12.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.22.12.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.22.12.png Binary files differnew file mode 100644 index 0000000..c9526b3 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.22.12.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.24.06.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.24.06.png Binary files differnew file mode 100644 index 0000000..b7eabdf --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 11.24.06.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.17.13@2x.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.17.13@2x.png Binary files differnew file mode 100644 index 0000000..1ed832c --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.17.13@2x.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.26.10@2x.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.26.10@2x.png Binary files differnew file mode 100644 index 0000000..0bc2580 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.26.10@2x.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.28.57@2x.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.28.57@2x.png Binary files differnew file mode 100644 index 0000000..9f08859 --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.28.57@2x.png diff --git a/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.30.19@2x.png b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.30.19@2x.png Binary files differnew file mode 100644 index 0000000..1072f2f --- /dev/null +++ b/SI/Resource/Blockchain & Cryptography/Screenshots/CleanShot 2023-10-05 at 19.30.19@2x.png diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md new file mode 100644 index 0000000..938bf62 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Bias and Variance.md @@ -0,0 +1,45 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Bias-and-Variance +--- + +# Bias + +## Training Data (80~90%) vs. Test Data (10~20%) + +## Complexity + +- Complexity increases from linear to non-linear models +- Under-fitting: occurs when there is a lot of data +- Over-fitting: occurs when there is little data + +## Bias and Variance + +- Bias and variance are both types of error in an algorithm. +- $\begin{align} MSE (\hat{\theta}) \equiv E_{\theta} ((\hat{\theta} - \theta)^2) & = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta} - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2 + 2((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) + (E(\hat{\theta}) - \theta)^2) \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + 2(E(\hat{\theta}) - \theta)E(\hat{\theta} - E(\hat{\theta})) + (E(\hat{\theta}) - \theta)^2 \\ & = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 \\ & = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 \end{align}$ +- $\bbox[teal,5px,border:2px solid red] { MSE (\hat{\theta}) = E((\hat{\theta} - E(\hat{\theta}))^2) + (E(\hat{\theta}) - \theta)^2 = Var_{\theta}(\hat{\theta}) + Bias_{\theta}(\hat{\theta}, \theta)^2 }$ +- Bias: under-fitting +- Variance: over-fitting +![[Pasted image 20231218005054.png]] +![[Pasted image 20231218005035.png]] + +## Trade-off + +- Solution + - Use validation data set + - $\bbox[teal,5px,border:2px solid red]{\text{Train data (80\%)+ Valid data (10\%) + Test data (10\%)}}$ + - Cannot directly participate in model training + - Continuously evaluates in the learning base, and stores the best existing performance + - K-fold cross validation + - **Leave-One-Out Cross-Validation (LOOCV)** + - a special case of k-fold cross-validation where **K** is equal to the number of data points in the dataset. + - What if **K** becomes bigger? + 1. train data $\uparrow$ + 2. bias error $\downarrow$ and variance error $\uparrow$ + 3. cost $\uparrow$ + - [[Regularization]] loss function
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Classification.md b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md new file mode 100644 index 0000000..12c125e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Classification.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Classification +--- + +# Classification + +Classification in the context of machine learning and statistics is a type of supervised learning approach where the output variable is a category, such as "spam" or "not spam", or "disease" and "no disease". In classification, an algorithm is trained on a dataset of labeled examples, learning to associate input data points with the corresponding category label. Once trained, the model can then categorize new, unseen data points. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Discrete (categorical) +3. Model types: Binary - [[Sigmoid]], polynomial - [[softmax]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md new file mode 100644 index 0000000..fdf8905 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Gradient descent.md @@ -0,0 +1,21 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Gradient-descent +--- + +# Gradient Descent + +- Update parameters that minimize values of loss functions +- **Instantaneous is always _0_** +- Therefore, update parameter that a derivative of loss function is equal to 0 + +## Pseudo Code + +1. Find the derivative of the loss function at the current parameters. +2. Update parameters in the opposite direction of the derivative +3. Repeat steps 1 and 2 as many epochs (hyperparameter) until the differential value becomes 0.
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md new file mode 100644 index 0000000..895783f --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Hyperparameter.md @@ -0,0 +1,15 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Hyperparameter +--- + +# Hyperparameter + +- A hyperparameter is a parameter whose value is set before the learning process begins +- Unlike model parameters, which are learned during training, hyperparameters are not learned from the data +- Human has to set the hyperparameters
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md new file mode 100644 index 0000000..fdd175c --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Linear Regression.md @@ -0,0 +1,26 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Linear-Regression +--- + +# Linear Regression + +## Simple Linear Regression + +- A model has one feature of data +- $y = w_0 + w_1*x$ + +## Multiple Linear Regression + +- A model has several features of data +- $y = w_0 + w_1*x + \dots + w_D*x_D$ + +## Polynomial Regression + +- A model has increased degrees of features +- $y = w_0 + w_1*x + w_2*x^2 + w_m*x^m$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md new file mode 100644 index 0000000..24b714e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Logistic Regression.md @@ -0,0 +1,47 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Logistic-Regression +--- + +# Logistic Regression + +$$Y = \begin{cases} 1 & \text{if korean} \\ 2 & \text{if american} \\ 3 & \text{if japanese} \end{cases} \qquad\qquad\qquad Y = \begin{cases} 1 & \text{if american} \\ 2 & \text{if korean} \\ 3 & \text{if japanese} \end{cases}$$ + +- In general regression, the results vary depending on the order (size) of the labels +- A different loss function or model is needed +- The logistic regression model is a regression model in the form of a logistic function. +- The predicted value changes depending on the value of wX. + 1. If $w^{T}X > 0$: classified as 1. + 2. If $w^{T} X< 0$: classified as 0. +- How should the loss function be defined to find the optimal value of the parameter *w*? + +## Odds + +- The odds ratio represents how many times higher the probability of success (y=1) is compared to the probability of failure (y=0) +- $odds = \dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logit + +- The function form of taking the logarithm of odds +- When the range of input probability (p) is [0,1], it outputs [$-\infty$, $+\infty$] +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)}$ + +## Logistic Function + +- The inverse function of the logit transformation +- $logit(p) = log(odds) = log\dfrac{p(y=1|x)}{1-p(y=1|x)} = w_{0}+ w_1x_{1}+ \dots + w_Dx_{D}= w^TX$ +- $p(y = 1|x) = \dfrac{e^{w^{T}X}}{1 + e^{w^{T}X}} = \dfrac{1}{1 + e^{-w^TX}}$ +- Therefore, the logistic function is a combination of linear regression and the sigmoid function + +## Bayes' Theorem + +- $P(w|X) = \dfrac{P(X|w)P(w)}{P(X)} \propto P(X|w)P(w)$ +- **[[Posterior]]** probability, $P(w|X)$: The probability distribution of a hypothesis given the data (reliability). +- **Likelihood** probability, $P(X|W)$: The distribution of given data assuming a hypothesis is known, albeit not well understood. +- **Prior** probability, $P(w)$: The probability of a hypothesis known in general before looking at the data. +- There are two methods to estimate the hypothesis (model parameters) using these probabilities: [[Maximum Likelihood Estimation]] (**MLE**) and [[Maximum A Posteriori]] (**MAP**)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md new file mode 100644 index 0000000..2283442 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Optimization.md @@ -0,0 +1,53 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Optimization +--- + +# Optimization + +## Math + +### Partial Differentiation/Derivative + +- Differentiate about a specific variable +- Consider others as constants +- $\dfrac{\partial y}{\partial x}$ +- e.g., $f(x,y) = x^2 + xy + 3$ + +### Chain Rule + +- $\dfrac{dy}{dx} = \dfrac{dy}{du}*\dfrac{du}{dx}$ +- e.g., $y = ln(u), u = 2x + 4$ + +## Loss Function + +### Mean Squared Error (MSE) + +- $L = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y_i})^2$ + +## Parameter Calculation + +### Least Square Method (LSM) + +- Minimize error of data +- a: slope (coefficient) +- b: intercept +- $L = \sum_{i=1}^{N} (y_i - (ax_i + b))^2$ + +#### Method 1. + +- $0 = \dfrac{\partial L}{\partial a} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-x_i) = 2(a\sum_{i=1}^{N} x_i^2 + b\sum_{i=1}^{N} x_i - \sum_{i=1}^{N} x_iy_i)$ +- $0 = \dfrac{\partial L}{\partial b} = \sum_{i=1}^{N} 2(y_i - (ax_i + b))(-1) = 2(a\sum_{i=1}^{N} x_i + b\sum_{i=1}^{N}1 - \sum_{i=1}^{N} y_i)$ +- $a^* = \dfrac{\sum_{i=1}^{N}(x-\bar{x})(y-\bar{y})}{\sum_{i=1}^{N}(x-\bar{x})^2}$ +- $b^* = \bar{y} - a^*\bar{x}$ + +#### Method 2. + +- Partial differentiation with respect to matrix $||Y - WX||^2$ +- $-2X^T(Y-WX) = 0$ +- $W = (X^TX)^{-1}X^TY$
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md new file mode 100644 index 0000000..8e2a1df --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regression.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regression +--- + +# Regression + +Regression is a statistical method used in data analysis that models the relationship between a dependent variable and one or more independent variables. The main goal of regression is to predict the value of the dependent variable based on the values of the independent variables. It's widely used in various fields like economics, finance, biology, engineering, and more, for forecasting, estimating, and identifying relationships among variables. + +1. Input: Continuous (float), Discrete (categorical), etc. +2. Output: Continuous (float) +3. Model types: a function (e.g., $y = w_1x + w_0$)
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md new file mode 100644 index 0000000..7475105 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Regularization.md @@ -0,0 +1,46 @@ +--- +id: 2023-12-18 +aliases: December 18, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Regularization +--- + +# Regularization + +## Regularization Loss Function + +- The complexity of the model **$\uparrow$** == the number of model parameters **$\uparrow$** +- As the complexity of the model **$\uparrow$** == overfitting **$\uparrow$** +- Define a model with high complexity, learn only important parameters, and set unnecessary parameter values to **0** + +## Regularization Types + +### Ridge Regression (L2 Regression) + +- $L = \bbox[orange,3px] {\sum_{i=1}^{n} (y_i - (\beta_0 + \sum_{j=1}^{D} \beta_j x_{ij}))^{2}} + \bbox[blue,3px] {\lambda \sum_{j=1}^{D} \beta_j^2}$ + - $\bbox[orange,3px]{\text{MSE}}$ + - $\bbox[blue,3px]{\text{Ridge}}$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of squares + +### Lasso Regression (L1 Regression) + +- $L = \sum\limits_{i=1}^{n}(y_{i}- (\beta_{0}+ \sum\limits_{j=1}^{D} \beta_{j}x_{ij}))^{2}+ \lambda \sum\limits_{j=1}^{D} |\beta_j|$ + - If MSE loss is not reduced, the loss value of the penalty term becomes larger + - Lambda $\lambda$ is a hyperparameter that controls the impact of regularization + - Normalization function expressed as sum of absolute + +![[Pasted image 20231218032332.png]] + +## Question + +- $\lambda \uparrow$ == Bias error $\uparrow$ and Variance error $\downarrow$ +- Sparsity: Ridge regression $<$ Lasso regression +- How to make more parameters that have 0 values? + 1. $\lambda \uparrow$ + 2. Exponent $\downarrow$ + - Good? or Bad?: don't know
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md new file mode 100644 index 0000000..41ad25a --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Sigmoid.md @@ -0,0 +1,17 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Sigmoid +--- + +# Sigmoid + +- Non-linear function for binary classification problem +- $y = \dfrac{1}{1+e^{-x}}$ +- $0 \le output \le 1$, mean = 0.5 + +![[Pasted image 20231218035418.png]]
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md new file mode 100644 index 0000000..c886fac --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Contents/Softmax Regression.md @@ -0,0 +1,14 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- link-note +- Data-Science +- Machine-Learning +- Softmax-Regression +--- + +# Softmax Regression + +- Non-linear function for polynomial classification problem +- $y_i = \dfrac{e^{X_i}}{\sum_{k=1}^{K} e^{X_k}}$, k: # of class
\ No newline at end of file diff --git a/SI/Resource/Data Science/Machine Learning/Machine Learning.md b/SI/Resource/Data Science/Machine Learning/Machine Learning.md new file mode 100644 index 0000000..0e1d5bc --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Machine Learning.md @@ -0,0 +1,99 @@ +--- +id: 2023-12-17 +aliases: December 17, 2023 +tags: +- Machine-Learning +- Data-Science +--- +# Machine Learning + +Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of systems and algorithms that can learn from and make decisions or predictions based on data. The core idea is to enable a machine to make intelligent decisions or predictions without being explicitly programmed to perform the task. + +## Machine + +Machine is a model or a function derived from data given by human + +## Learning + +Learning is to find the best model represented data, meaning optimization of parameter + +- Optimization of parameter: By statistical method or [[Gradient descent | gradient descent]] + +## Goal + +[[Optimization]]: Find optimal parameters + +- Optimal: is the best representation of data + - A model with the smallest difference between predictions $\hat{y}$ and actual values $y$ + - A model parameter makes the smallest loss + +## Supervised Learning + +1. [[Regression]] + - [[Linear Regression]] and [[Nonlinear Regression]] + - [[Gradient descent]] + - [[Bias and Variance]] Trade-off +2. [[Classification]] + - [[Sigmoid]] + - [[Logistic Regression]] and [[Softmax Regression]] + - [[Support Vector Machine]] ([[Support Vector Machine |SVM]]) + - [[Decision Tree]] + - [[Linear Discriminant Analysis]] ([[Linear Discriminant Analysis |LDA]]) + 1. [[Ensemble]] + - [[Bagging]] + - [[Boosting]] + +## Unsupervised Learning + +1. [[Preprocessing]] + - [[Principal Component Analysis]] ([[Principal Component Analysis |PCA]]) + - [[Singular Value Decomposition]] ([[Singular Value Decomposition |SVD]]) +2. [[Clustering]] + - [[K-Means]] + - [[Mean Shift]] + - [[Gaussian Mixture Model]] + - [[DBSCAN]] + +## Notations + +- Data Properties + - Features (= attributes, independent variables): X + - characteristics of data or items + - N: # of data sample + - D: # of features + - Label (dependent variables): y + - if there is a label, it is supervised. Otherwise, it is unsupervised + - Parameter (=weight): learnable parameters that a model have, not given data + - [[Hyperparameter]]: parameters that human has to decide +- Input vs. Output + - Input ($X$): values, parts of features, are put into a model + - Output ($\hat{y}$): values of prediction derived from model +- Linear vs. Nonlinear + - Linear regression: a model can be implemented by a linear function + - Simple Linear Regression: Involves two variables — one independent variable and one dependent variable. The relationship between these variables is modeled as a straight line. + - Multiple Linear Regression: Uses more than one independent variable to predict a dependent variable. The relationship is still linear in nature, meaning it assumes a straight-line relationship between each independent variable and the dependent variable. + - ex) $y = w_0 + w_1*x_1 + w_2*x_2 + \dots + w_D*x_D, y = w_0 + w_1*x_1 + w_2*x^2$ + - Non-linear regression: a model can't be implemented by a linear function + - ex) $log(y) = w_0 + w_1*log(x), y = max(x, 0)$ + +## Basic Math for ML + +- Function + - Relationships or rules between two groups + - $y = f(x)$, $x$ = input, $y$ = output +- Linear function + - $y = a*x +b$, $(a \ne 0)$ + - a: coefficient (slope), b = intercept +- Instantaneous Rate of Change + - is **the change in the rate at a particular instant**, and it is same as the change in the derivative value at a specific point. + - Slope (coefficient) where a $x (=a)$ meets a graph +- Derivation + - finds the instantaneous rate of change + - $f'(x)$ or $\dfrac{d}{dx}f(x)$ +- Minimum of function + - **The instantaneous rate of change at the minimum of function is always *0*.** + - Using this property, can find optimal parameter value +- Exponential + - $e = \lim_{n \to \infty} (1 + \dfrac{1}{n})^n$ + - a function or growth pattern that increases at a rate proportional to its current value + - $\dfrac{d}{dx} e^x = e^x$ diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png Binary files differnew file mode 100644 index 0000000..1cec75a --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218004948.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png Binary files differnew file mode 100644 index 0000000..1bfb03b --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005035.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png Binary files differnew file mode 100644 index 0000000..f8b24a0 --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218005054.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png Binary files differnew file mode 100644 index 0000000..95960cd --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218032332.png diff --git a/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png Binary files differnew file mode 100644 index 0000000..17f6a0e --- /dev/null +++ b/SI/Resource/Data Science/Machine Learning/Screenshots/Pasted image 20231218035418.png diff --git a/SI/Resource/Data Science/SQL/MySQL/MySQL.md b/SI/Resource/Data Science/SQL/MySQL/MySQL.md new file mode 100644 index 0000000..cc23029 --- /dev/null +++ b/SI/Resource/Data Science/SQL/MySQL/MySQL.md @@ -0,0 +1,433 @@ +--- +id: 2023-12-19 +aliases: + - December 19 + - "2023" +tags: + - link-note + - MySQL + - Data-Science + - SQL +--- +# [[MySQL]] +## Variables + +```sql +INT +DECIMAL(3,2) -- gpa 3.70 -> 3 digits and 2 decimal +VARCHAR(10) +BLOB -- Binary Large Object, stores large data, e.g., images +DATE -- YYYY-MM-DD +TIMESTAMP -- YYYY-MM-DD HH:MM:SS +``` + +## Basic +### Table +- Create tables + +```sql +CREATE TABLE table_name ( +-- attributes variables (KEY types) (options) + id INT PRIMARY KEY AUTO_INCREMENT, + name VARCHAR(46) NOT NULL, + major VARCHAR(10) DEFAULT value, + column_name VARCHAR(10) UNIQUE, +); +``` + +- Drop tables + +```sql +DROP TABLE table_name; +``` + +- Describe tables + +```sql +DESCRIBE table_name; + +DESC table_name; +``` + +- RENAME table + +```sql +ALTER TABLE table_name +RENAME TO new_table_name; +``` + +- UPDATE table + +```sql +UPDATE table_name +SET 1_column_name = new_value, (2_column_name = new_value); +(WHERE other_column_name = value;) +``` + +### Column +- ALTER columns + +```sql +ALTER TABLE table_name ADD column_name DECIMAL(3,2); + +ALTER TABLE table_name DROP COLUMN column_name; +``` + +- ADD column + +```sql +ALTER TABLE table_name +ADD COLUMN new_column_name DATA_TYPE; +``` + +- CHANGE column_name + +```sql +ALTER TABLE table_name +CHANGE COLUMN column_name (DATA_TYPE); +``` + +- DROP column_name + +```sql +ALTER TABLE table_name +DROP COLUMN column_name; +``` + +### Data +- INSERT INTO data + +```sql +INSERT INTO table_name VALUES(data); + +INSERT INTO table_name(column_name) VALUES(data); +``` + +- SELECT */data + +```sql +SELECT * +FROM table_name; + +SELECT column_name (AS col_name) +FROM table_name; +``` + +- DELETE FROM data + +```sql +DELETE FROM table_name; +WHERE a_column_name = value; +``` + +### Condition +- ORDER BY ... ASCending & DESCending + +```sql +SELECT * +FROM table_name; +ORDER BY column_name ASC / DESC; +``` + +- LIMIT int + +```sql +SELECT * +FROM table_name; +LIMIT int; +``` + +- condition (cmp: <, <=, >, >=, =, <>) + +```sql +SELECT * +FROM table_name; +WHERE column_name cmp value; +``` + +- OR, AND + +```sql +SELECT * +FROM table_name; +WHERE column_name + OR/AND + condition; +``` + +- (NOT) IN (condition_1, condition_2, ...) + +```sql +SELECT * +FROM table_name; +WHERE column_name IN (column_name_1, column_name_2, ...); +``` + +- BETWEEN ... AND ... + +```sql +SELECT * +FROM table_name; +WHERE column_name BETWEEN int AND int; +``` + +#### Filtering +- LIKE (target: char%, %char, %str%, _char%, char_%_%, char%char) + +```sql +SELECT * +FROM table_name; +WHERE column_name LIKE target; +``` + +- CASE WHEN condition(cmp) ELSE condition(cmp) END + +```sql +SELECT * + CASE WHEN condition THEN result + WHEN condition THEN result + ELSE result + END +FROM table_name; +``` + +## Functions +- Calculation (fn: SUM(), COUNT(), MIN(), MAX(), AVG()) + +```sql +SELECT fn(column_name) variable +FROM table_name; +``` + +- Digit (fn: TRUNCATE(digit, n), ROUND(digit, n), CEIL(), FLOOR(), ABS(), SIGN()) + +```sql +SELECT fn() +``` + +- String (fn: LENGTH(), TRIM(), UPPER(), LOWER(), LEFT(str, n), RIGHT(str, n), REPLACE(), LPAD(), RPAD(), SUBSTRING(), CONCAT(), CONCAT_WS()) + +```sql +SELECT column_name, LENGHT(column_name) +FROM table_name; + +SELECT * +FROM table_name; +WHERE column_name TRIM(str); + +SELECT UPPER/LOWER(column_name) +FROM table_name; + +SELECT LEFT/RIGHT(column_name, int) +FROM table_name; + +SELECT REPLACE(column_name, target, result) +FROM table_name; + +SELECT column_name, LPAD/RPAD(column_name, target, result) +FROM table_name; + +SELECT SUBSTRING(column_name, target, number_of_result) +FROM table_name; + +SELECT CONCAT(str_1, str_2, ...) +FROM table_name; +``` + +### Date +- Date & Time (fn: CURDATE(), CURTIME(), NOW(), YEAR(), MONTH(), DAY(), LAST_DAY(), WEEKDAY(), DAYNAME()) + +```sql +SELECT fn() +FROM table_name; +``` + +- fn: ADDDATE(target, int), SUBDATE(target, int), DATEDIFF(target_1, target_2), TIMEDIFF(target_1, target_2) + +```sql +SELECT column_name + fn(target, int) +FROM table_name; +``` + +#### Date Format +- DATE_FORMAT (fm: %Y, %y, %M, %b, %W, %a, %i, %T, %m, %c, %d, %e, %I, %H, %r, %S) + +```sql +SELECT column_name, DATE_FORMAT(column_name, fm_1, fm_2, ...) +FROM table_name; +``` + +## GROUP BY +- GROUP BY column_name_1, column_name_2, ..., Calculation_1, Calculation_2, ... + +```sql +SELECT column_name, fn() +FROM table_name; +GROUP BY column_name; +``` + +- HAVING condition (cmp) after GROUP BY + +```sql +SELECT column_name +FROM table_name; +GROUP BY column_name; +HAVING condition; +``` + +## Division & Analysis +- RANK()/DENSE_RANK() OVER (ORDER BY column_name) + +```sql +SELECT * + RANK()/DENSE_RANK() OVER(ORDER BY column_name (ASC/DESC)) +FROM table_name; +``` + +### Division +- PARTITION BY + +```sql +SELECT * + RANK()/DENSE_RANK()(PARTITION BY column_name ORDER BY column_name (ASC/DESC)))))))))) +FROM table_name +``` + +### Before/After +- LAG(column_name, int): before /LEAD(column_name, int): after + +```sql +SELECT * + LAG(column_name, int)/LEAD(column_name, int) OVER(PARTITION BY ORDER BY column_name) +FROM table_nam e +ORDER BY column_name +``` + +### Border +- ROWS (FRAME: CURRENT ROW, int PRECEDING, int FOLLOWING, UNBOUNDED PRECEDING, UNBOUNDED FOLLOWING) AND (FRAME) + +```sql +SELECT column_name_1, column_name_2, column_name_3 + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN int PRECEDING AND CURRENT ROW) new_column_name + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN int FOLLOWING AND CURRENT ROW) new_column_name +FROM table_name; +ORDER BY column_name_1, column_name_2; + +SELECT column_name_1, column_name_2, column_name_3 + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) new_column_name + AVG(column_name_3) OVER (PARTITION BY column_name_1 ORDER BY column_name_2 + ROWS BETWEEN UNBOUNDED FOLLOWING AND CURRENT ROW) new_column_name +FROM table_name; +ORDER BY column_name_1, column_name_2; +``` + +### Merge +- (INNER) JOIN + +```sql +SELECT */(alias_1.)column_name_1, (alias_2.)column_name_2 +FROM table_name_1 (as alias_1) JOIN table_name_2 (as alias_2); +ON (alias_1/)table_name_1.column_name_1 = (alias_2/)table_name_2.column_name_2; +``` + +- LEFT/RIGHT JOIN + +```sql +SELECT * +FROM table_name_1 as tb1 LEFT/RIGHT JOIN table_name_2 as tb2; +ON tb1.column_name_1 = tb2.column_name_2; +ORDER BY column_name; +``` + +- CROSS JOIN + +```sql +SELECT * +FROM table_name_1 CROSS + JOIN table_name_2; + (JOIN table_name_...) + (...) +``` + +## Sub Query / Temporary Table +### Multi Columns +- () + +```sql +SELECT * +FROM +( + SELECT *, RANK() OVER(PARTITION BY column_name_1 ORDER BY column_name_2 DESC) + FROM table_name; +) alias; +(WHERE condition;) +``` + +### Uni Column & Uni Data +- IN + +```sql +SELECT * +FROM table_name_1; + (JOIN column_name_1 column_name_2 + ON column_name_1 = column_name_2;) +WHERE column_name_2 IN + (SELECT column_name_2 + FROM table_name_2; + WHERE condition;); + +SELECT * +FROM table_name_1; +WHERE condition =(/IN) + (SELECT column_name_1 + FROM table_name_2;) +``` + +### VIEW / WITH +- Create + +```sql +CREATE VIEW temp_column_name as alias_1 +SELECT * +FROM table_name_1; +``` + +- Drop + +```sql +DROP VIEW alias_1; +``` + +- WITH + +```sql +WITH temp_table_name as( + SELECT *, + RANK() OVER(PARITION BY column_name_1 ORDER BY column_name_2 DESC) new_column_name; + FROM table_name +) +SELECT * +FROM table_name +(WHERE condition) +``` + +## Custom Function +- Nominal Return Value + +```sql +DELIMITER $$ +CREATE FUNCTION fn_name( + num1 INT, + num2 INT, +) RETURN INT +BEGIN + DECLARE sum_ INT; + SET sum_ = num1 + num2; + RETURN sum_; +end $$ +DELIMITER ; +``` diff --git a/SI/Resource/Dynamic Programming/fibonacci.py b/SI/Resource/Dynamic Programming/fibonacci.py new file mode 100644 index 0000000..9c6b4fd --- /dev/null +++ b/SI/Resource/Dynamic Programming/fibonacci.py @@ -0,0 +1,17 @@ +# fibonacci +# time complexity: O(n^2) +# space complexity: O(n) +def fib(n): + return 1 if n <= 2 else fib(n - 1) + fib(n - 2) + + +# memoization +# time complexity: O(n) +# space complexity: O(n) +def fib_memo(n, memo={}): + if n <= 2: + return 1 + if n in memo: + result = memo[n] + result = fib_memo(n - 1, memo[n]) + fib_memo(n - 2, memo[n]) + return result diff --git a/SI/Resource/Fundamentals of Data Mining/Ch.3.md b/SI/Resource/Fundamentals of Data Mining/Ch.3.md new file mode 100644 index 0000000..f456f74 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Ch.3.md @@ -0,0 +1,10 @@ +--- +id: Ch.3 +aliases: [] +tags: [] +--- + +[[convex]]: in the clustering [[concave]]: not in the clustering [[K-Means]]: +can only detect clusters that are linearly separable + +- in higher dimension, it can increase the chance for having a line diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Apriori.md b/SI/Resource/Fundamentals of Data Mining/Content/Apriori.md new file mode 100644 index 0000000..450a899 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Apriori.md @@ -0,0 +1,10 @@ +--- +id: Apriori +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-25 at 16.08.52@2x.png]] ![[CleanShot 2023-10-25 at +16.09.04@2x.png]] ![[CleanShot 2023-10-25 at 16.09.16@2x.png]] ![[CleanShot +2023-10-25 at 16.10.06@2x.png]] ![[CleanShot 2023-10-25 at 16.10.19@2x.png]] +![[CleanShot 2023-10-25 at 16.10.34@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Attributes of Mixed Type.md b/SI/Resource/Fundamentals of Data Mining/Content/Attributes of Mixed Type.md new file mode 100644 index 0000000..4f3149d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Attributes of Mixed Type.md @@ -0,0 +1,7 @@ +--- +id: Attributes of Mixed Type +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-23 at 18.43.35@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Binary.md b/SI/Resource/Fundamentals of Data Mining/Content/Binary.md new file mode 100644 index 0000000..996865c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Binary.md @@ -0,0 +1,12 @@ +--- +id: Binary +aliases: + - Example: Dissimilarity between Asymmetric Binary Variables +tags: [] +--- + +![[CleanShot 2023-10-23 at 18.19.11@2x.png]] + +### Example: Dissimilarity between Asymmetric Binary Variables + +![[CleanShot 2023-10-23 at 18.19.54@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Categorical.md b/SI/Resource/Fundamentals of Data Mining/Content/Categorical.md new file mode 100644 index 0000000..6d1ec43 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Categorical.md @@ -0,0 +1,7 @@ +--- +id: Categorical +aliases: [] +tags: [] +--- + +[[Nominal]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Compare and Contrast.md b/SI/Resource/Fundamentals of Data Mining/Content/Compare and Contrast.md new file mode 100644 index 0000000..d72dd09 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Compare and Contrast.md @@ -0,0 +1,27 @@ +--- +id: Compare and Contrast +aliases: + - clustering algorithms +tags: + - Compare-and-Contrast +--- + +## [[clustering algorithms]] + +- [[K-Means]] vs [[K-Medoids]] + - In _K-means_ algorithm, they choose means as the centroids but in the + _K-medoids_, data points are chosen to be the medoids[^1]. +- [[K-Means]] vs [[K-Medians]] + +| K-Means | K-Medians | +| ---------------------------------------------------------- | --------------------------------------------- | +| The center is not necessarily one of the input data points | Centers will be chosen from data points | +| Not flexible | More flexible | +| Not immune to noise and outliers | More robust to noise and outliers | +| Minimize the sum of squared Euclidian distance | Minimize a sum of pairwise of dissimilarities | + +[^1]: + Medoids are **representative objects of a data set or a cluster within a + data set whose sum of dissimilarities to all the objects in the cluster is + minimal**. Medoids are similar in concept to means or centroids, but medoids are + always restricted to be members of the data set. diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Complexity.md b/SI/Resource/Fundamentals of Data Mining/Content/Complexity.md new file mode 100644 index 0000000..e73595a --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Complexity.md @@ -0,0 +1,25 @@ +--- +id: Complexity +aliases: + - Computational/Time Complexity +tags: [] +--- + +## Computational/Time Complexity + +- K-Medoids: + - PAM: $O(K(n - k)^2)$ +- Kernel K-Means: + - Computational complexity (time and space) is higher than K-Means + - Need to compute and store n x n kernel matrix generated from the kernel + function on the original data, where n is the number of points +- Hierarchical Clustering: + - Agglomerative Clustering + - Time complexity: $O(n^2)$ + - Algorithmic Complexity: $O(m^2logm)$ +- Density-based Clustering: + - DBSCAN: + - Computational complexity: $O(nlogn)$ + - worst case: $O(n^2)$ + - OPTICS: + - Complexity: $O(NlogN)$ diff --git a/SI/Resource/Fundamentals of Data Mining/Content/DBSCAN.md b/SI/Resource/Fundamentals of Data Mining/Content/DBSCAN.md new file mode 100644 index 0000000..00e737d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/DBSCAN.md @@ -0,0 +1,12 @@ +--- +id: DBSCAN +aliases: + - DBSCAN: A Density-Based Spatial Clustering Algorithm +tags: [] +--- + +## DBSCAN: A Density-Based Spatial Clustering Algorithm + +![[CleanShot 2023-10-24 at 22.21.02@2x.png]] ![[CleanShot 2023-10-24 at +22.21.23@2x.png]] ![[CleanShot 2023-10-24 at 22.21.37@2x.png]] ![[CleanShot +2023-10-24 at 22.21.59@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Data Matrix and Dissimilarity Matrix.md b/SI/Resource/Fundamentals of Data Mining/Content/Data Matrix and Dissimilarity Matrix.md new file mode 100644 index 0000000..7f299d1 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Data Matrix and Dissimilarity Matrix.md @@ -0,0 +1,7 @@ +--- +id: Data Matrix and Dissimilarity Matrix +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-23 at 18.09.20@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Density-based Clustering.md b/SI/Resource/Fundamentals of Data Mining/Content/Density-based Clustering.md new file mode 100644 index 0000000..d5e11fe --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Density-based Clustering.md @@ -0,0 +1,21 @@ +--- +id: Density-based Clustering +aliases: + - Density-Based Clustering Methods +tags: [] +--- + +## Density-Based Clustering Methods + +- Clustering based on density (a **local** cluster criterion), such as + density-connected points +- Major features: + - Discover clusters of **arbitrary** shape + - Handle noise + - One scan (only examine the local region to justify density) + - Need density parameters as termination condition +- Several interesting studies: + - <u>[[DBSCAN]]</u>: Ester, et al. + - <u>[[OPTICS]]</u>: Ankerst, et al. + - DENCLUE: Hinneburg & D. Keim + - CLIQUE: Agrawal, et al. diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Dissimilarity.md b/SI/Resource/Fundamentals of Data Mining/Content/Dissimilarity.md new file mode 100644 index 0000000..be8891f --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Dissimilarity.md @@ -0,0 +1,12 @@ +--- +id: Dissimilarity +aliases: [] +tags: [] +--- + +- Dissimilarity (or distance) measure + - [Numerical measure](app://obsidian.md/Numeric) of how different two data + objects are + - **In some sense, the inverse of similarity**: The lower, the more alike + - Minimum dissimilarity is often 0 (i.e., completely similar) + - Range [0, 1] or [0, ], depending on the definition diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Distance functions.md b/SI/Resource/Fundamentals of Data Mining/Content/Distance functions.md new file mode 100644 index 0000000..0b1c658 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Distance functions.md @@ -0,0 +1,24 @@ +--- +id: Distance functions +aliases: + - Numeric +tags: [] +--- + +## Numeric + +### Minkowski distance + +![[CleanShot 2023-10-23 at 18.16.28@2x.png]] + +### Sepcial Cases of Minkowski Distance + +![[CleanShot 2023-10-23 at 18.17.24@2x.png]] + +- Manhattan (or city block) distance +- Euclidean distance +- "supremum" distance + +### Example: Special Cases + +![[CleanShot 2023-10-23 at 18.18.06@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Distance measures (mixted types of attributes).md b/SI/Resource/Fundamentals of Data Mining/Content/Distance measures (mixted types of attributes).md new file mode 100644 index 0000000..d97338b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Distance measures (mixted types of attributes).md @@ -0,0 +1,15 @@ +--- +id: Distance measures (mixted types of attributes) +aliases: + - Distance measures (mixted types of attributes) +tags: [] +--- + +## Distance measures ([[mixted types of attributes]]) + +- [[Dissimilarity]] (or distance) measure + - [[Distance functions|Numeric|Numerical measure]] of how different two data + objects are + - **In some sense, the inverse of similarity**: The lower, the more alike + - Minimum dissimilarity is often 0 (i.e., completely similar) + - Range [0, 1] or [0, $\infty$], depending on the definition diff --git a/SI/Resource/Fundamentals of Data Mining/Content/F-measure.md b/SI/Resource/Fundamentals of Data Mining/Content/F-measure.md new file mode 100644 index 0000000..a467817 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/F-measure.md @@ -0,0 +1,9 @@ +--- +id: F-measure +aliases: [] +tags: [] +--- + +- F-Measure ![[CleanShot 2023-10-25 at 15.57.51@2x.png]] ![[CleanShot 2023-10-25 +at 15.58.04@2x.png]] ![[CleanShot 2023-10-25 at 15.58.19@2x.png]] ![[CleanShot +2023-10-25 at 15.58.40@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/FP-growth.md b/SI/Resource/Fundamentals of Data Mining/Content/FP-growth.md new file mode 100644 index 0000000..f533719 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/FP-growth.md @@ -0,0 +1,21 @@ +--- +id: FP-growth +aliases: [] +tags: [] +--- + +• You can expect to ‘draw’ the fp-tree using a text-based format as follows: A:3 + +| + +B:3. C:1 + +| | \ + +C:2 E:1 D:1 + +![[CleanShot 2023-10-25 at 16.11.31@2x.png]] ![[CleanShot 2023-10-25 at +16.11.42@2x.png]] ![[CleanShot 2023-10-25 at 16.11.56@2x.png]] ![[CleanShot +2023-10-25 at 16.12.11@2x.png]] ![[CleanShot 2023-10-25 at 16.12.25@2x.png]] +![[CleanShot 2023-10-25 at 22.44.57.png]] +[Youtube](https://www.youtube.com/watch?v=GcgfSJAaBto) diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md new file mode 100644 index 0000000..096fe2c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Hierarchical Clustering.md @@ -0,0 +1,54 @@ +--- +id: Hierarchical Clustering +aliases: + - Hierarchical Clustering: Basic Concepts +tags: [] +--- + +## Hierarchical Clustering: Basic Concepts + +- Hierarchical clustering + - Generate a clustering hierarchy (drawn as a **dendrogram**) + - Not required to specify _K_, the number of clusters + - More deterministic + - No iterative refinement +- Two categories of algorithms: + - **[[#Agglomerative Clustering Algorithm|Agglomerative]]**: Start with + sigleton clusters, continuously merge two clusters at a time to build a + **bottom-up** hierarchy of clusters + - **Divisive**: Start with a huge macro-cluster, split it continuously into + two groups, generating a **top-down** hierarchy of clusters + +## Dendrogram: Shows How Clusters are Merged + +![[CleanShot 2023-10-24 at 22.11.21@2x.png]] + +## Strengths of Hierarchical Clustering + +- Do not have to assume any particular number of clusters + - Any desired number of clusters can be obtained by 'cutting' the dendrogram + at the proper level +- They may correspond to meaningful taxonomies + - Example in biological sciences (e.g., animal kingdom, phylogeny + reconstruction, ...) + +## Agglomerative Clustering Algorithm + +![[CleanShot 2023-10-24 at 22.14.30@2x.png]] ![[CleanShot 2023-10-24 at +22.14.45@2x.png]] ![[CleanShot 2023-10-24 at 22.15.07@2x.png]] ![[CleanShot +2023-10-24 at 22.15.23@2x.png]] + +## Extensions to Hierarchical Clustering + +- Major weaknesses of hierarchical clustering methods + - Can never undo what was done previously + - Do not scale well + - Time complexity of at least $O(n^2)$, where $n$ is the number of total + objects +- Other hierarchical clustering algorithms + - BIRCH (1996): Use CF-tree and incrementally adjust the quality of + sub-clusters + - CURE (1998): Represent a cluster using a set of well-scattered + representative points + - CHAMELEON (1999): Use graph partitioning methods on the K-nearest neighbor + graph of the data diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md new file mode 100644 index 0000000..d61d82b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Means.md @@ -0,0 +1,23 @@ +--- +id: K-Means +aliases: [] +tags: + - Clustering-Algorithms + - Compare-and-Contrast +--- + +- K-Means [(Youtube)](https://www.youtube.com/watch?v=KzJORp8bgqs) + - Each cluster is represented by the center/centroid of the cluster +- Given K, the number of clusters, the _K-Means_ clustering algorithm is + outlined as follows + - Select _**K**_ points as initial centroids + - **Repeat** + - Form _K_ clusters by assigning each point to its **closest** centroid + - Re-compute the centroid (i.e., _**mean point**_) of each cluster + - **Until** convergence criterion is satisfied (**e.g., no change of cluster + membership, or a certain # of iterations have been reached, or, the [[SSE]] + is < a pre-defined threshold**) +- Different kinds of distance measures can be used + - [[Manhattan distance]] ($L_1$ norm), [[Euclidean distance]] ($L_2$ norm), + [[Cosine similarity]], [[Mahalanobis distance]] ![[CleanShot 2023-10-24 at +15.34.07@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Medians.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Medians.md new file mode 100644 index 0000000..91614d3 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Medians.md @@ -0,0 +1,25 @@ +--- +id: K-Medians +aliases: + - *K-Medians*: Handling Outliers by Computing Medians [(Youtube)]() +tags: [] +--- + +## _K-Medians_: Handling Outliers by Computing Medians [(Youtube)]() + +- Medians are less sensitive to outliers than means + - Think of the median salary vs. mean salary of a large firm when adding a few + top executives! +- _**K-Medians**_: Instead of taking the **mean** value of the object in a + cluster as a reference point, **medians** are used ($L_1$-norm is often used + as the distance measure) +- The criterion function for the _K-Medians_ algorithm: $$ S = + \sum*{k=1}^{K}\sum*{x*{i\in{C_k}}}|x*{ij} - m e d\_{kj}|$$ +- The _K-Medians_ clustering algorithm: + - Select _K_ points as the initial representative objects (i.e., as initial _K + medians_) + - **Repeat** + - Assign every point to its nearest median + - Re-compute the median using the median of <u>==each individual + feature==</u> + - **Until** convergence criterion is satisfied diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md new file mode 100644 index 0000000..6dc59fe --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Medoids.md @@ -0,0 +1,45 @@ +--- +id: K-Medoids +aliases: + - Handling Outliers: From _K-Means_ to _K-Medoids_ (Youtube)(Youtube-E.g.) +tags: + - Compare-and-Contrast + - "" + - Complexity +--- + +## Handling Outliers: From _K-Means_ to _K-Medoids_ [(Youtube)](https://www.youtube.com/watch?v=OFELCn-6r2o) [(Youtube-E.g.)](https://www.youtube.com/watch?v=ChBxx4aR-bY&t=0s) + +- K-Medoids: Instead of taking the **mean** value of the objects in a cluster as + a reference point, **medoids** can be used, which is the **most centrally + located** object in a cluster + +- The _K-Medoids_ clustering algorithm: + - Select _K_ points as the initial ==representative== objects (i.e., as + initial _K medoids_) + - **Repeat** + - Assigning each point to the cluster with the closest medoid + - Randomly select a ==non-representative== object $o_i$ + - Compute ==the total cost _S_== of swapping the medoid $m$ with + $o_i$(_==e.g.,useSSEtomeasure==_) + - If $S<0$ (_==e.g., new SSE < previous SSE==_), then swap $m$ with $o_i$ to + form the new set of medoids + - **Until** convergence criterion is satisfied + +### Discussion on _K-Medoids_ Clustering + +- _K-Medoids_ Clustering: Find _representative_ objects (<u>medoids</u>) in + clusters +- _PAM_ (Partitioning Around Medoids) + - Starts from an initial set of medoids, and + - Iteratively replaces one of the medoids by one of the non-mdedoids if it + improves the total sum of the squared errors (SSE) of the resulting + clustering + - _PAM_ works effectively for small data sets but does not scale well for + large data sets (due to the computational complexity) + - Computational [[Complexity]]: PAM: $O(K(n - k)^2)$ (quite expensive!) +- Efficiency improvements on PAM + - _**CLARA**_ (Kaufmann & Rousseeuw, 1990): + - PAM on samples; $O(Ks^2 + K(n - K))$, $s$ is the sample size + - _**CLARANS**_ (Ng & Han, 1994): ==Randomised re-sampling==, ensuring + efficiency + quality diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-Modes.md b/SI/Resource/Fundamentals of Data Mining/Content/K-Modes.md new file mode 100644 index 0000000..6dea96c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-Modes.md @@ -0,0 +1,27 @@ +--- +id: K-Modes +aliases: + - K-Modes: Clustering Categorical Data (Youtube) +tags: [] +--- + +## K-Modes: Clustering Categorical Data [(Youtube)](https://www.youtube.com/watch?v=b39_vipRkUo) + +- _K-Means_ cannot directly handle non-numerical (categorical) data - ==how to + calculate the mean? What do they mean?== + - Mapping categorical value to 0/1 cannot generate quality clusters (in + high-dimensional space) +- _**K-Modes**_: An extension to _K-Means_ by replacing means of clusters with + _**modes**_ + - Mode: The value that appears the most often in a **set** of data values +- <u>Dissimilarity</u> measure between object X and the center of a cluster + $Z_l$ + - $\Phi(x_j, z_j) = 1 - n_j^{\dfrac{r}{n_l}}$ when $x_j = z_j = r$; 1 when + $x_j \ne z_j$ + - where $z_j$ is the categorical value of attribute j in $Z_l$, $n_l$ is the + number of objects in cluster $l$, and $n_j^r$ is the number of objects + whose attribute value is r +- This dissimilarity measure (distance function) is _**frequency-based**_ +- Algorithm is still based on iterative _object_ cluster assignment and + _centroid_ update +- A mixture of categorical and numerical data: Using a _**K-Prototype**_ method diff --git a/SI/Resource/Fundamentals of Data Mining/Content/K-means++.md b/SI/Resource/Fundamentals of Data Mining/Content/K-means++.md new file mode 100644 index 0000000..5cf1343 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/K-means++.md @@ -0,0 +1,11 @@ +--- +id: K-means++ +aliases: [] +tags: [] +--- + +- The first centroid is selected at random +- The next centroid selected is the one that is the farthest from the currently + selected (selection is based on a weighted probability score) +- The selection continues until _K_ centroids are obtained +- [Youtube](https://www.youtube.com/watch?v=z2yncM2HE6M) diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md b/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md new file mode 100644 index 0000000..828a053 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Kernel K-Means.md @@ -0,0 +1,39 @@ +--- +id: Kernel K-Means +aliases: + - *Kernel K-Means Clustering* +tags: [] +--- + +## _Kernel K-Means Clustering_ + +- _Kernel K-Means_ can be used to detect non-convex clusters + - A region is **convex** if it contains all the line segments connecting any + pair of its points. Otherwise, it is **concave** + - _K-Means_ can only detect clusters that are **linearly** separable +- <u>Idea</u>: Project data onto the high-dimensional kernel space, and then + perform _K-Means_ clustering + - Map data points in the input space onto a high-demensional feature space + using the kernel function ![[CleanShot 2023-10-24 at 21.42.19@2x.png]] + - Perform _K-Means_ on the mapped feature space +- Computational complexity (time and space) is higher than K-Means + - Need to compute and store _n x n_ kernel matrix generated from the kernel + function on the original data, where _n_ is the number of points + +## Kernel Functions and Kernel K-Means Clustering + +- Typical kernel functions: + - Polynomial kernel of degree h: $K(X_i, X_j) = (X_i*X_j+1)^h$ + - <u>Gaussian radial basis function (RBF) kernel</u>: $K(X_i, X_j) = + e^{-||X_i - X_j||^2 / 2\sigma^2}$ + - Sigmoid kernel: $K(X_i, X_j) = tanh(KX_i*X_j - \delta)$ +- The formula for kernel matrix K for any two points $x_i, x_j \in C_k$ is + $K_{x_ix_j} = \phi(x_i)*\phi(x_j)$ +- The [[SSE]] criterion of _kernel K-means_: $$SSE(c) = + \sum_{k=1}^{K}\sum_{x_i\in{C_k}}||\phi(x_i) - c_k||^2$$ + - The formula for the cluster centroid: $$c_k = + \dfrac{\sum_{x_i\in{C_k}}\phi(x_i)}{|C_k|}$$ +- Clustering can be performed without the actual individual projections + $\Phi(x_i)$ and $\Phi(x_j)$ for the data points $x_i, x_j \in{C_k}$ (use + K(Xi,Xj) instead) ![[CleanShot 2023-10-24 at 22.06.43@2x.png]] ![[CleanShot +2023-10-24 at 22.07.05@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/NMI.md b/SI/Resource/Fundamentals of Data Mining/Content/NMI.md new file mode 100644 index 0000000..d886599 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/NMI.md @@ -0,0 +1,21 @@ +--- +id: NMI +aliases: + - Normalized mutual information (NMI) +tags: [] +--- + +## Normalized mutual information (NMI) + +- Mutual information: + - Quantifies the amount of shared info between $I(C,T) = + \sum_{i=1}^{r}\sum{j=1}^{k}p_{ij}log\dfrac{p{ij}}{p_{c_i}p_{T_j}}$ + - Measures the dependency between the observed joint probability $p_{ij}$ of + $C$ and $T$, and the expected joint probability $p_{Ci} * p_P{Tj}$ under the + independence assumption + - When $C$ and $T$ are independent, $p_{ij} = p_{Ci} * p_{Tj}, I(C, T) = 0$. + However, there is no upper bound on the mutual information +- **Normalized mutual information (NMI)** $$N M I(C, T) = + \sqrt{\dfrac{I(C,T)}{H(C)}*\dfrac{I(C, T)}{H(T)}} = \dfrac{I(C, + T)}{\sqrt{H(C) * H(T)}}$$ + - Value range of NMI: [0, 1]. Value close to 1 indicates a good clustering diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Nominal.md b/SI/Resource/Fundamentals of Data Mining/Content/Nominal.md new file mode 100644 index 0000000..e4f447f --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Nominal.md @@ -0,0 +1,9 @@ +--- +id: Nominal +aliases: [] +tags: [] +--- + +- Proximity Measure for Categorical Attributes ![[CleanShot 2023-10-23 at +18.20.19@2x.png]] +- [[Target encoding]] for Multi-Class Classification diff --git a/SI/Resource/Fundamentals of Data Mining/Content/OPTICS.md b/SI/Resource/Fundamentals of Data Mining/Content/OPTICS.md new file mode 100644 index 0000000..3578883 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/OPTICS.md @@ -0,0 +1,35 @@ +--- +id: OPTICS +aliases: + - OPTICS: Ordering Points To Identify Clustering Structure +tags: [] +--- + +## OPTICS: Ordering Points To Identify Clustering Structure + +![[CleanShot 2023-10-24 at 22.22.49@2x.png]] ![[CleanShot 2023-10-24 at +22.23.05@2x.png]] + +## OPTICS (cont.) + +- OPTICS does not explicitly produce a data set clustering. +- It outputs a cluster ordering. + - It is a linear list of all objects under analysis and + - Represents the density-based clustering structure of the data. +- Objects in a denser cluster are listed closer to each other in the cluster + ordering +- Ordering is equivalent to density-based clustering obtained from a wide range + of parameter settings. +- Thus OPTICS does not require the user to provide a specific density threshold. +- The cluster ordering can be used to extract basic clustering information + (e.g., cluster centers, or arbitrary-shaped clusters), derive the intrinsic + clustering structure, as well as provide a visualization of the clustering. +- It computes an ordering of all objects in a given database. And +- It stores the core-distance and a suitable reachability-distance for **each** + object in the database. +- OPTICS maintains a list called **OrderSeeds** to help generate the output + ordering. +- Objects in **OrderSeeds** + - Are stored by the reachability-distance from their respective closet core + objects, + - That is, by the smallest reachability-distance of each object. diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Ordinal.md b/SI/Resource/Fundamentals of Data Mining/Content/Ordinal.md new file mode 100644 index 0000000..e5bf256 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Ordinal.md @@ -0,0 +1,7 @@ +--- +id: Ordinal +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-23 at 18.28.55@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/SSE.md b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md new file mode 100644 index 0000000..007fcb7 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/SSE.md @@ -0,0 +1,23 @@ +--- +id: SSE +aliases: + - Partitioning Algorithms: Basic Concepts +tags: [] +--- + +## Partitioning Algorithms: Basic Concepts + +- <u>Partitioning method</u>: Discovering the groupings in the data by + optimizing a specific ==objective function== and ==iteratively== improving the + quality of partitions +- _K-partitioning_ method: Partitioning a dataset _**D**_ of _**n**_ objects + into a set of _**K**_ clusters so that an objective function is optimized + (e.g., the sum of squared distances is minimized within each cluster, where + $C_k$ is the centroid or medoid of cluster $C_k$) + - A typical objective function: **Sum of Squared Errors (SSE)** $$ SSE(C) = + \sum*{k=1}^{K}\sum*{x\_{i\in{C_k}}}||x_i - c_k||^2$$ +- **Problem definition**: Given _K_, find a partition of _K clusters_ that + optimizes the chosen partitioning criterion + - Global optimal: Needs to exhaustively enumerate all partitions + - Heuristic methods (i.e., greedy algorithms): _[[K-Means]], [[K-Medians]], + [[K-Medoids]], etc_ diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Target encoding.md b/SI/Resource/Fundamentals of Data Mining/Content/Target encoding.md new file mode 100644 index 0000000..0502447 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Target encoding.md @@ -0,0 +1,8 @@ +--- +id: Target encoding +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-23 at 18.20.36@2x.png]] ![[CleanShot 2023-10-23 at +18.22.00@2x.png]] ![[CleanShot 2023-10-23 at 18.22.12@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/Z-score.md b/SI/Resource/Fundamentals of Data Mining/Content/Z-score.md new file mode 100644 index 0000000..34730a6 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/Z-score.md @@ -0,0 +1,20 @@ +--- +id: Z-score +aliases: + - Z-score - An example +tags: [] +--- + +### Z-score - An example + +- John gets a mark of 64 in a physics test, where the mean is 50 and the + standard deviation is 8. +- Jane gets a mark of 74 in a chemistry test, where the mean is 58 and the + standard deviation is 10 + +Who has a better class performance? + +- John's z = (64 - 50) / 8 = 1.75 +- Jane's z = (74 - 58) / 10 = 1.6 +- Although Jane's score is higher, John's score is further above the mean, and + it might be concluded that John has achieved greater success. diff --git a/SI/Resource/Fundamentals of Data Mining/Content/attributes.md b/SI/Resource/Fundamentals of Data Mining/Content/attributes.md new file mode 100644 index 0000000..5b03e85 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/attributes.md @@ -0,0 +1,14 @@ +--- +id: attributes +aliases: [] +tags: [] +--- + +- Attribute (or dimensions, features, variables) + - A data field, representing a characteristic of feature of a data object + - E.g., customer_ID, name, address +- Types: + - [[Nominal]](e.g., red, blue) + - [[Binary]](e.g., {true, false}) + - [[Ordinal]](e.g., {freshman, sophomore, junior, senior}) + - [[Numeric]]; [[quantitative]] ([[discrete]] vs [[continuous]]) diff --git a/SI/Resource/Fundamentals of Data Mining/Content/clustering algorithms.md b/SI/Resource/Fundamentals of Data Mining/Content/clustering algorithms.md new file mode 100644 index 0000000..ad1b29b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/clustering algorithms.md @@ -0,0 +1,21 @@ +--- +id: clustering algorithms +aliases: + - Compare and Contrast +tags: + - Clustering-Algorithms +--- + +## Compare and Contrast + +### _[[K-Means]]_ + +### _[[K-means++]]_ + +### _[[K-Medoids]]_ + +### _[[K-Medians]]_ + +### _[[K-Modes]]_ + +### _[[Kernel K-Means]]_ diff --git a/SI/Resource/Fundamentals of Data Mining/Content/clustering evaluation.md b/SI/Resource/Fundamentals of Data Mining/Content/clustering evaluation.md new file mode 100644 index 0000000..2e175e9 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/clustering evaluation.md @@ -0,0 +1,9 @@ +--- +id: clustering evaluation +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-25 at 16.48.09@2x.png]] ![[CleanShot 2023-10-25 at +16.48.19@2x.png]] ![[CleanShot 2023-10-25 at 17.08.12@2x.png]] ![[CleanShot +2023-10-25 at 17.08.29@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/external.md b/SI/Resource/Fundamentals of Data Mining/Content/external.md new file mode 100644 index 0000000..91faabe --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/external.md @@ -0,0 +1,42 @@ +--- +id: external +aliases: + - Measuring Clustering Quality: ==External== Methods +tags: [] +--- + +## Measuring Clustering Quality: ==External== Methods + +- Given the **ground truth** _T, Q(C, T)_ is the **quality measure** for a + clustering C +- _Q(C, T)_ is good if it satisfies the following **four** essential criteria + - **Cluster homogeneity** + - The purer, the better + - **Cluster completeness** + - Assign objects belonging to the same category in the ground truth to the + same cluster + - **Rag bag better than alien** + - Putting a heterogeneous object into a pure cluster should be penalized + **more** than putting it into a _rag bag_ (i.e., "miscellaneous" or + "other" category) + - **Small cluster preservation** + - Splitting a small category into pieces is more harmful than splitting a + large category into pieces + +## Commonly Used External Measures + +- **Matching-based measure** + - Purity, maximum matching, [[F-measure]] +- **Entropy-Based Measures** + - Conditional entropy + - <u>Normalized mutual information (NMI)</u> + - Variation of information +- **Pairwise measures** + - Four possibilities: True positive (TP), FN, FP, TN + - Jaccard coefficient, Rand statistic, Fowlkes-Mallow measure +- **Correlation measures** + - Discretized Huber static, normalized discretized Huber static +- Purity vs Maximum Matching ![[CleanShot 2023-10-25 at 15.57.30@2x.png]] +- [[F-measure]] ![[CleanShot 2023-10-25 at 15.57.51@2x.png]] ![[CleanShot +2023-10-25 at 15.58.04@2x.png]] ![[CleanShot 2023-10-25 at 15.58.19@2x.png]] + ![[CleanShot 2023-10-25 at 15.58.40@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/internal.md b/SI/Resource/Fundamentals of Data Mining/Content/internal.md new file mode 100644 index 0000000..22834ec --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/internal.md @@ -0,0 +1,8 @@ +--- +id: internal +aliases: [] +tags: [] +--- + +![[CleanShot 2023-10-25 at 15.59.49@2x.png]] ![[CleanShot 2023-10-25 at +16.00.03@2x.png]] diff --git a/SI/Resource/Fundamentals of Data Mining/Content/mixted types of attributes.md b/SI/Resource/Fundamentals of Data Mining/Content/mixted types of attributes.md new file mode 100644 index 0000000..8b8cb77 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/mixted types of attributes.md @@ -0,0 +1,21 @@ +--- +id: mixted types of attributes +aliases: + - Attributes of Mixed Types +tags: [] +--- + +### Attributes of Mixed Types + +- A dataset may contain all different types + - [[Nominal]], symmetric [[binary]], asymmetric [[binary]], [[Distance +functions|numeric]], and [[ordinal]] +- One may use a weighted formula to combine their effects: $$d(i, j) = + \dfrac{\Sigma_{f=1}^{p}w_{ij}^{(f)}d_{ij}^{(f)}}{\Sigma_{f=1}^{p}w_{ij}^{(f)}}$$ + - if _f_ is numeric: use the **normalized distance (e.g., min-max distance + [0-1])** +- If _f_ is binary or nominal: $d_{ij}^{(f)}=0$ if $x_{if} = x_{jf}$; or + $d_{ij}^{(f)} = 1$ otherwise (there are other options) +- If _f_ is ordinal + - Compute ranks $z_{if}$ where $z_{if} = \dfrac{r_{if} - 1}{M_f - 1}$ + - Treat $z_{if}$ as numeric diff --git a/SI/Resource/Fundamentals of Data Mining/Content/pattern discovery.md b/SI/Resource/Fundamentals of Data Mining/Content/pattern discovery.md new file mode 100644 index 0000000..f88fe80 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/pattern discovery.md @@ -0,0 +1,27 @@ +--- +id: pattern discovery +aliases: + - What is Pattern Discovery? +tags: [] +--- + +## What is Pattern Discovery? + +- ==What are patterns?== + - ==Patterns==: A set of items, subsequences, or substructures that occur + frequently together (or strongly correlated) in a data set + - Patterns represent ==intrinsic== and ==important properties== of datasets +- ==Pattern discovery==: Uncovering patterns from massive data sets +- Motivation examples: + - What products were often purchased together? + - What are the subsequent purchases after buying an iPad? + - What code segments likely contain copy-and-paste bugs? + - What word sequences likely form phrases in this corpus? ![[CleanShot +2023-10-26 at 01.53.56@2x.png]] ![[CleanShot 2023-10-26 at 01.54.32@2x.png]] + ![[CleanShot 2023-10-26 at 01.54.44@2x.png]] ![[CleanShot 2023-10-26 at +01.55.00@2x.png]] + +## Efficient Pattern Mining Methods + +- The [[Apriori]] Algorithm +- [[FP-Growth]]: A Frequent Pattern-Growth Approach diff --git a/SI/Resource/Fundamentals of Data Mining/Content/variants.md b/SI/Resource/Fundamentals of Data Mining/Content/variants.md new file mode 100644 index 0000000..767e696 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Content/variants.md @@ -0,0 +1,16 @@ +--- +id: variants +aliases: + - Variations of _K-Means_ +tags: [] +--- + +# Variations of _K-Means_ + +- There are many variants of the _K-Means_ method, varying in different aspects + - Choosing better initial centroid estimates + - _[[K-means++]]_, _Intelligent K-Means_, _Genetic K-Means_ + - Choosing different representative prototypes for the clusters + - _[[K-Medoids]]_, _[[K-Medians]]_, _[[K-Modes]]_ + - Applying feature transformation techniques + - _Weighted K-Means_, _[[Kernel K-Means]]_ diff --git a/SI/Resource/Fundamentals of Data Mining/Midterm - CS663.md b/SI/Resource/Fundamentals of Data Mining/Midterm - CS663.md new file mode 100644 index 0000000..358ed89 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Midterm - CS663.md @@ -0,0 +1,69 @@ +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +id: Midterm - CS663 +aliases: + - Review +tags: [] +------------------------------------------------------------------------------------------------------------ +# Review +## Types of Questions +- True or false +- Multi-choice +- Explain (e.g., [[K-Means]], [[NMI]]) +- [[Compare and Contrast]] (e.g., [[clustering algorithms]]) +- Computational questions (e.g., [[DBSCAN]] and [[OPTICS]] (similar to your + assignment questions), [[FP-growth|fp-tree]] and [[pattern discovery]] + (examples from the lecture), [[clustering evaluation]] ) + +## Subjects +- [[Distance measures (mixted types of attributes)]] + - How to handle [[nominal]] attributes … + - [[nominal|Match or no-match]](as a whole or individually) + - [[nominal|One-hot encoding]] + - [[Target encoding]] +- Normalization ([[z-score]], [[mixted types of attributes|min-max]], …) +- Clustering techniques: + - [[K-Means]] and its [[variants]] + - [[Hierarchical Clustering]] ([[Hierarchical Clustering|Agglomerative]]) + - [[Density-based Clustering]]([[DBSCAN]], [[OPTICS]]) + - [[Complexity]], [[distance functions]] +- How to measure clustering quality ([[internal]] and [[external]] measures, + [[F-measure]] and its averaging/combining options when applied to multiple + classes/clusters) +- Frequent pattern mining ([[Apriori]] Algorithm, [[FP-growth]]) + +--- + +############################################################################ [[Data Matrix and Dissimilarity Matrix]] + +- Data matrix + - A data matrix of n data points with / dimensions ![[CleanShot 2023-10-23 at +17.37.59@2x.png]] +- Dissimilarity (distance) matrix (n by n) + - n data points, but registers only the distance _d(i,j)_(typically + metric)![[CleanShot 2023-10-23 at 17.41.47@2x.png]] + - Usually symmetric, thus a trinagular matrix + - **[[Distance functions]]** are usually different for real, boolean, + categorical, ordinal, ratio, and vector variables + - Weights can be associated with different variables based on applications and + data semantics + +### Standardizing Numeric Data + +- [[Z-score]]: $z = \dfrac{x - \mu}{\sigma}$ + - X: raw score to be standardized, $\mu$: mean of the population, $\sigma$: + standard deviation + - the distance between the raw score and the population mean in units of the + standard deviation + - negative when the raw score is below the mean, "+" when above +- An alternative way: Calculate the mean absolute deviation $S_{f} = + \dfrac{1}{n}(|x_{1f} - m_f| + |x_{2f} - m_f| + ... + |x_{nf} - m_f|)$ where + $m_f = \dfrac{1}{n}(x_{1f} + x_{2f} + ... + x_{nf})$ + - standardized measure (z-score): $z_{if} = \dfrac{x_{if} - m_f}{S_f}$ +- **Using mean absolute devication is more robust than using standard + deviation** + +### Proximity Measure for [[Binary|Binary Attributes]] + +############################################################################ Proximity Measure for [[nominal|Categorical Attributes]] + +############################################################################ [[Ordinal]] Variables diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.37.59@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.37.59@2x.png Binary files differnew file mode 100644 index 0000000..3696a4c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.37.59@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.41.47@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.41.47@2x.png Binary files differnew file mode 100644 index 0000000..c3b4196 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 17.41.47@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.09.20@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.09.20@2x.png Binary files differnew file mode 100644 index 0000000..69ae2b8 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.09.20@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.16.28@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.16.28@2x.png Binary files differnew file mode 100644 index 0000000..3dc651a --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.16.28@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.17.24@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.17.24@2x.png Binary files differnew file mode 100644 index 0000000..c7af4ad --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.17.24@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.18.06@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.18.06@2x.png Binary files differnew file mode 100644 index 0000000..06fdfcf --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.18.06@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.11@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.11@2x.png Binary files differnew file mode 100644 index 0000000..15cbe36 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.11@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.54@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.54@2x.png Binary files differnew file mode 100644 index 0000000..50b7e7b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.19.54@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.19@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.19@2x.png Binary files differnew file mode 100644 index 0000000..18e9111 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.19@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.36@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.36@2x.png Binary files differnew file mode 100644 index 0000000..34a9fca --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.20.36@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.00@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.00@2x.png Binary files differnew file mode 100644 index 0000000..11c66f2 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.00@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.12@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.12@2x.png Binary files differnew file mode 100644 index 0000000..0cac265 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.22.12@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.28.55@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.28.55@2x.png Binary files differnew file mode 100644 index 0000000..16a9b08 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.28.55@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.43.35@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.43.35@2x.png Binary files differnew file mode 100644 index 0000000..1fff0f3 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-23 at 18.43.35@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 15.34.07@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 15.34.07@2x.png Binary files differnew file mode 100644 index 0000000..1ac50c5 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 15.34.07@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 21.42.19@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 21.42.19@2x.png Binary files differnew file mode 100644 index 0000000..88cdb81 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 21.42.19@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.06.43@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.06.43@2x.png Binary files differnew file mode 100644 index 0000000..afe6414 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.06.43@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.07.05@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.07.05@2x.png Binary files differnew file mode 100644 index 0000000..165004d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.07.05@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.11.21@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.11.21@2x.png Binary files differnew file mode 100644 index 0000000..c039b8b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.11.21@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.30@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.30@2x.png Binary files differnew file mode 100644 index 0000000..94d41cf --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.30@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.45@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.45@2x.png Binary files differnew file mode 100644 index 0000000..2b934c4 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.14.45@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.07@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.07@2x.png Binary files differnew file mode 100644 index 0000000..df28ef3 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.07@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.23@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.23@2x.png Binary files differnew file mode 100644 index 0000000..396166d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.15.23@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.02@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.02@2x.png Binary files differnew file mode 100644 index 0000000..f7a8993 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.02@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.23@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.23@2x.png Binary files differnew file mode 100644 index 0000000..33c153c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.23@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.37@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.37@2x.png Binary files differnew file mode 100644 index 0000000..65ccece --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.37@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.59@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.59@2x.png Binary files differnew file mode 100644 index 0000000..f7b18fa --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.21.59@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.22.49@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.22.49@2x.png Binary files differnew file mode 100644 index 0000000..9a8c1e5 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.22.49@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.23.05@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.23.05@2x.png Binary files differnew file mode 100644 index 0000000..7b968ff --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-24 at 22.23.05@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.30@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.30@2x.png Binary files differnew file mode 100644 index 0000000..688beff --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.30@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.51@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.51@2x.png Binary files differnew file mode 100644 index 0000000..773775d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.57.51@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x 1.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x 1.png Binary files differnew file mode 100644 index 0000000..c1387ec --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x 1.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x.png Binary files differnew file mode 100644 index 0000000..c1387ec --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.04@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.19@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.19@2x.png Binary files differnew file mode 100644 index 0000000..9310051 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.19@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.40@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.40@2x.png Binary files differnew file mode 100644 index 0000000..89ca276 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.58.40@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.59.49@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.59.49@2x.png Binary files differnew file mode 100644 index 0000000..5c66d6d --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 15.59.49@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.00.03@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.00.03@2x.png Binary files differnew file mode 100644 index 0000000..f86fd7e --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.00.03@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.08.52@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.08.52@2x.png Binary files differnew file mode 100644 index 0000000..7a80a70 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.08.52@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.04@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.04@2x.png Binary files differnew file mode 100644 index 0000000..ebce74a --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.04@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.16@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.16@2x.png Binary files differnew file mode 100644 index 0000000..ee8f4e7 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.09.16@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.06@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.06@2x.png Binary files differnew file mode 100644 index 0000000..3d22627 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.06@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.19@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.19@2x.png Binary files differnew file mode 100644 index 0000000..f75a7dc --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.19@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.34@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.34@2x.png Binary files differnew file mode 100644 index 0000000..17df89c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.10.34@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.31@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.31@2x.png Binary files differnew file mode 100644 index 0000000..4f5411b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.31@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.42@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.42@2x.png Binary files differnew file mode 100644 index 0000000..18c4973 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.42@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.56@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.56@2x.png Binary files differnew file mode 100644 index 0000000..bc850a5 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.11.56@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.11@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.11@2x.png Binary files differnew file mode 100644 index 0000000..0b8801a --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.11@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.25@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.25@2x.png Binary files differnew file mode 100644 index 0000000..98bd7e1 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.12.25@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.09@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.09@2x.png Binary files differnew file mode 100644 index 0000000..56fe16c --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.09@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.19@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.19@2x.png Binary files differnew file mode 100644 index 0000000..156f250 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 16.48.19@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.12@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.12@2x.png Binary files differnew file mode 100644 index 0000000..043a4e2 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.12@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.29@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.29@2x.png Binary files differnew file mode 100644 index 0000000..ebe9790 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.29@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.47@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.47@2x.png Binary files differnew file mode 100644 index 0000000..6eeb3e8 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 17.08.47@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 22.44.57.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 22.44.57.png Binary files differnew file mode 100644 index 0000000..23c94d4 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-25 at 22.44.57.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.53.56@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.53.56@2x.png Binary files differnew file mode 100644 index 0000000..ffe51dc --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.53.56@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.32@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.32@2x.png Binary files differnew file mode 100644 index 0000000..c123219 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.32@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.44@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.44@2x.png Binary files differnew file mode 100644 index 0000000..222267b --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.54.44@2x.png diff --git a/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.55.00@2x.png b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.55.00@2x.png Binary files differnew file mode 100644 index 0000000..b694ba0 --- /dev/null +++ b/SI/Resource/Fundamentals of Data Mining/Screenshots/CleanShot 2023-10-26 at 01.55.00@2x.png diff --git a/SI/Resource/Templates/Notes/link-note.md b/SI/Resource/Templates/Notes/link-note.md new file mode 100644 index 0000000..ef40bad --- /dev/null +++ b/SI/Resource/Templates/Notes/link-note.md @@ -0,0 +1,11 @@ +--- +id: <% tp.date.now("YYYY-MM-DD") %> +aliases: <% tp.date.now("MMMM D, YYYY") %> +tags: +- link-note +- <% tp.file.title.replaceAll(" ","-") %> +- <% tp.file.folder(true).split("/")[1].replaceAll(" ", "-") %> +- <% tp.file.folder(true).split("/")[2].replaceAll(" ", "-") %> +- <% tp.file.folder(true).split("/")[3].replaceAll(" ", "-") %> +--- +# [[<% tp.file.title %>]] diff --git a/SI/Resource/Templates/Notes/todo-daily-note.md b/SI/Resource/Templates/Notes/todo-daily-note.md new file mode 100644 index 0000000..32140e9 --- /dev/null +++ b/SI/Resource/Templates/Notes/todo-daily-note.md @@ -0,0 +1,21 @@ +--- +id: <% tp.date.now("YYYY-MM-DD") %> +aliases: <% tp.date.now("MMMM D, YYYY") %> +tags: +- daily-note +- todo +- <% tp.file.folder(true).split("/")[1].replaceAll(" ", "-") %> +- <% tp.file.folder(true).split("/")[2].replaceAll(" ", "-") %> +- <% tp.file.title.replaceAll(" ","-") %> +--- +# Todo-daily-note + +## TODO + +#todo/daily-task + +- [ ] 1h ⏳: SQL +- [ ] 2h ⏳ : Machine Learning +- [ ] 1h ⏳ : Words +- [ ] 2h ⏳ : R +- [ ] 1h ⏳ : Sentences |
